Description
1 Overview
In Project 2, we’ll be reading files, storing them into data structures, and then processing the data. By computer standards, the files won’t be large – only a few thousand lines, most of the time – but they’ll be large enough that it’s worthwhile to write a program to analyze them. You’ll write three versions of the program. The programs will all do the same things, but they’ll store the data in different ways; this will be a chance to see different ways to organize your data. In my opinion, none of them are really right or wrong; each is optimized for a different sort of operation. (Hopefully, you’ll see what I mean when you write your program.)
1.1 Background: SQL
In these programs, you will be implementing a small number of commands, which were all chosen specifically for this program. But in the Real World, database management systems (DBMSs) support a very flexible language known as SQL (https://en.wikipedia.org/wiki/SQL), the Standard Query Language. As you write your program, think about what you would need to do it to make it more general. For instance, if you search for the maximum temperature, what would it take to search for the maximum of any field?
1.2 Style
Remember to read the class https://www2.cs.arizona.edu/classes/cs120/summer17/style/pgmstyle.html. In Project 1, I asked you to intentionally violate one rule (in each program); however, that was only with that one project. Please follow the style guide from here out!
1.3 Short and Long Problems
This week, you will have “short problems” (due on Monday), which you will complete on CloudCoder (https://practice.cs.arizona.edu). These are auto-graded for correctness, and the TAs will give you, by Wednesday, feedback on your style and design. The short problems will typically either be small parts of – or practice for – the long programs, which you will turn in through D2L, which are due on Friday.
1
1.4 Checking Your Output
Write the program exactly as described below. Test it using IDLE or something equivalent (make sure that you are using Python 3). Your output should match the required output exactly, because we will be using a script to check it for correctness. We recommend that you use https:// www.diffchecker.com to confirm that the output from your program matches the required output exactly!
NOTE: The “required output” includes everything that you print, including the prompts (if any) for user input. Pay attention to the requirements: if a prompt is required, print it – but if none is required, don’t add one!
1.5 main()
In this project, a main() function is not required. (If you don’t know what that is, feel free to ignore it for now.) However, I encourage you to use one, and we will probably require it later in the semester.
2 The Input/Output Format
All of the programs will take the same input, and will produce the same output. (We’ll only be releasing one set of testcases – you should use the same testcases on all of the programs.) In addition to the keyboard input, we will also have a set of data files which your program may open. Each will be in in CSV format. Read about CSVs here: https://en.wikipedia.org/wiki/Comma-separated_values (I’ll give a few more details about what to expect inside the CSVs below.)
2.1 csv Library is Banned
Python has many great libraries. I’ve been told that one of them is the csv module (I haven’t used it). However, I’m banning that library – because I want you to parse the file on your own.
2.2 Handling EOF on Input
In this Project, you’ll be using a loop to read a long list of commands (instead of having a fixed number of inputs). If a user closes the input without typing the exit command, your call to input() will throw an exception – and thus crash your program. Just like opening invalid filenames, this is OK (for now). We’ll teach you how to handle exceptions later in the course.
2
2.3 Input The first line of input1 to the program will be a file name. Open this file.2 Read the file into a data structure; below, we’ll specify the required data structure for each of your three programs. After you read the file, print out the number records you found in the file, like this:
The file had 317 records.
(As described below, you will be filtering out some of the lines in the file. The number that you print should be the number of records that you kept after all of the initial filtering is done.) After this, you program will simply read user input, parse them as “queries” about the data, and print the proper response. The user may type zero, one, or many lines of commands; your program should handle each line, one at a time. When the user types “exit”, your program should terminate. Each command is a single line. (Ignore blank lines, silently.) A command is one (or more) words; some commands also have a parameter – which could be an integer, or another word. You should ignore any leading or trailing whitespace on each line, as well as any extra whitespace between the words or parameters. That is, the lines
max TEMP
max TEMP max TEMP
should all be treated exactly the same.
Hint: Investigate split(), which is a method of Python’s strings. Try Googling for python documentation string split. Then test it out using IDLE. It will definitely be useful here!
2.3.1 Data File Format
The files that we are using reflect weather reports. They are CSVs, and there are LOTS of columns. You will need to discard most of the columns. However, I won’t tell you which ones – instead, read the spec below, and find out which columns are important. Keep the ones that are necessary, and throw all of the rest away.3 However, a couple of hints and rules:
1Do not print any user prompt. 2As with Project 1, you do not have to do error checking on the filename; if your program can’t open the file, it’s OK to crash. We’ll add more error checking later in the class, once we’ve learned about exceptions. 3Technically, I don’t care how many you throw away. If you keep way too many fields, we won’t mark you down for it. But why waste memory?
3
• You may assume that all of the files we use for input will have the exact same columns, in the same order, as the example files we publish. • You may assume that the first line is the list of headers – and that every other line is one record. • Only keep lines where the REPORTTPYE4 column is FM-15. Ignore every other line in the file. • Interpret the column HOURLYDRYBULBTEMPF as the temperature. You may assume that (for FM-15 lines) it is never blank, and always an integer. • Likewise, you may assume that the humidity is always an integer.
2.3.2 Commands to Support
You must support the user commands listed below. If you read a line that doesn’t match any of these commands, then print out the following error message (and then go on to read the next command):
Invalid command.
In addition, you must implement one extra command of your own design. You can decide what it is called, and what it does; it doesn’t have to be terribly complex, but make it (a little) interesting. For an example (far more than I expect you to do), try out the command bar graph on my demo program. All students must support the following commands (plus one of their own design): • filter TEMP at least After filtering, 13 records remain.
4Is this a typo? Maybe. It’s the actual column name in the file provided by the NOAA and I decided not to fix it. So your program must expect it. 5Feel free to ignore the equivalent SQL commands I provide – they don’t affect how your program should run. But I thought that you might find them interesting.
4
NOTE: Your command matching should be case-sensitive. So “filter” and “at least” must all be lower-case, and “TEMP” must be all-caps. (Follow this rule in all your commands.) NOTE: When you convert a string to an integer, Python might throw an exception. You may allow this to crash your program – but, if you know how to catch exceptions, it’s OK to catch the exception. • average TEMP SQL: SELECT AVG(temp) FROM table Calculate the average temp for the records that remain (that is, those that haven’t been filtered out yet). Print out the average as follows:
Average: 79.5
(Just print out whatever value you get from doing division. You don’t need to round it to any certain number of digits in this project.) If there are no remaining records, then print out the following error message:
ERROR: No records remain.
• max TEMP min TEMP SQL: SELECT * FROM table WHERE temp = (SELECT MIN(temp) FROM table) Find the record which has the maximum (or minimum) temperature. Print out all of the information about the record, like this:
Max TEMP: 2017-01-01 04:53 55 67%
(Note that this represents a temperature of 55 degrees Farenheit, with a relative humidity of 67% .) If multiple records tie for the max/min temperature, then print out all of the records which match. If there are no remaining records, then print out the following error message:
ERROR: No records remain.
• max HUMIDITY SQL: SELECT * FROM table WHERE humidity = (SELECT MAX(humidity) FROM table) Do the same as min/max TEMP above, except search for the maximum humidity. Print out the results in the same format.
5
• minSimple TEMP SQL: SELECT MIN(temp) FROM table The “simple” min TEMP command finds the minimum temperature in the current records – but then only prints the value – not the entire record. (If it finds duplicates, it only prints it once.) Print it out like this:
MIN TEMP: 117
• printAll SQL: SELECT * FROM table Print out all of the records that remain, using the same output format at the min/max commands. However, you of course will not print out the MAX or MIN line at the beginning. Instead, print out the following line after all of the records:
10 records printed.
If there are no records remaining, then simply print out the count of 0 records. • exit Terminate the program. Do not print anything else.
3 Version 1: List of Tuples
Name this version of the program tempDB tuples.py Probably the simplest way to represent a table of records in Python is a list of tuples. In this version of the program, represent each record with a tuple: that is, store one field (such as temperature, time, etc.) in each element of the tuple. You may decide what fields need to be saved, and in what order; however, all of the tuples must store the fields in the same order. Document the order that you’ve chosen in the comments for the program. To make your code more organized, you must implement a function named parse one line(). It must take one line of input (as a string), and return the appropriate tuple. I must return None if the line is not one that should be stored in the table. In addition, you should6 implement each command (except exit, of course) as a function. You may choose the names of these functions, but they should encapsulate all (or nearly) all of the logic of your program. Basically, your input-handling loop in the main body of your code should be little more than an if/elif/else chain: 6Some amount of flexibility will be allowed here. But don’t stray too far from this design. I’m giving you this instruction for your sanity! And also, to help teach you good style.
6
if it is command X: command_x(args) elif it is command Y: command_y(args)
etc…
4 Version 2: Objects IMPORTANT IMPORTANT IMPORTANT
Code reuse is always a good idea in Computer Science, and it’s critical in this program! Version 2 should be basically the same as Version 1, only with small changes. If Version 1 used a list of tuples, and this uses a list of objects, what should you change in the main body of your code? What small changes can you make to your functions to use the class, rather than accessing a tuple? DO NOT rewrite your program from scratch – if you value your sanity! IMPORTANT IMPORTANT IMPORTANT
Name this version of the program tempDB objects.py A more advanced way to represent a record is with a class. Your class will work like a tuple – but with some built-in functionality to make some parts more automatic. Define a class (you can choose the name) where each object of this class represents one record. The init method of the class must take a tuple (the same tuple as you used for Version 1) as a parameter; it must copy the fields of the tuple into the fields of the object. Code outside the class must not directly access any of the fields inside it. Instead, you must use the methods (shown below) to ask what the temperature and humidity are for the record. (This principle is known as encapsulation.) Your class must include the following methods: • str (self) This method must return a string, which is the exact line (except for the newline) to be printed by the min/max commands. • getTemp() Return the temperature for this record, as an integer. • getHumidity() Return the relative humidity for this record, as an integer.
(You may add additional methods if you find them useful.)
7
5 Version 3: Multiple Lists
(You should reuse your code here as well!) Name this version of the program tempDB lists.py The final version of this program completely re-organizes how we look at the data. Instead of organizing the data into rows (like the previous two), this organizes the data into columns! In this version of the program, you will have one list for each set of data. For instance, one list will store all of the temperatures (and nothing else); another list will store all of the humidities (and nothing else). This way of organizing data is unusual, and sometimes can be irritating to use – since you have to keep several lists coordinated. However, it can be the ideal data structure for certain operations. In fact, this program can perform some of the commands in a single line of Python – simply by calling a function in the Python standard library. Can you find out what it is? In this program, I will not tell you what functions to write – but I would encourage you to copy-paste parse one line() into this program, and use it again. It will probably be wise to write some other functions, as well – but I will leave that up to you. Just remember that your code must be well organized and easy to read!
6 Testcases
We have provided a few testcases (and example input files) at https://www.cs.arizona.edu/classes/cs120/summer17/projects/proj02/ Each testcase is made up of two files: an .in file – which contains the input; and an .out file – which is the correct output for that input.
6.1 Writing Your Own Testcases
You must write at least 10 testcases of your own. Use the same testcases on all three programs, to make sure that they work correctly. To find the correct output for each testcase you create, feed them into my solutions: https://www2.cs.arizona.edu/classes/cs120/summer17/projects/proj02/ demo.html (Note that Save your input as a .in file (one input per line), and the output from my solution as a .out file. Please name them as follows:
test-tempDB-<NetID- (That is, use tempDB as the program name for all testcases.)
8
7 Turning in Your Solution
You must turn in your code using D2L, using the Assignment folder for this project. Turn in: • Each required program • All of the testcases that you have created