HW 1: Exploring Course Data with Python

Due: Sun, Sep 10 2023 at 9:00 PM EST

Released: Mon, Sep 4 2023

The goal of this assignment is to refresh your memory of how to write Python programs, to get practice organizing data for particular computations, and to get practice working with CSV files. Don’t worry if you’re a little rusty – we’ll be reviewing these concepts in the first lab as well!

Setup

On your desktop, create a cs0112 folder. Inside this folder, create a new folder called hw1. Move all the files you downloaded into the hw1 folder.

The Assignment

Tim is preparing for a talk on courses at Brown and he wants YOU to help! He has a list of names, departments, and the number of courses a student has taken, but you want to extract useful information from this large database.

Part 1: Understanding the Data

You are provided with a CSV file that contains information about the courses students have taken. Each row in the file contains a student’s name, the department of the course they took, and the number of courses they took in that department. For example, a row in this database might look like this:

Ashley,MATH,4

This means that Ashley has taken four courses in the Mathematics Department. Check your courses.csv file for more examples.

What is a CSV file? A CSV file is a comma-separated values file, which allows data to be saved in a tabular format. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically looks like this:

    name,department,courses
    Ashley,MATH,4
    Ben,MATH,8
    Ben,EAST,1
    David,BIOL,5
    
You can think of this like an excel spreadsheet, where each row is a record and each column is a field. In this case, the fields are `name`, `department`, and `courses`.


Notice that any student could have taken courses in multiple departments, and any given department could have multiple students taking courses in them.

In addition, there can be two entries with the same student and the same department, but different counts. In that case, the course count would accumulate for that student and that particular department. For example, there can be Ashley,ANTH,4 entry early in the file, and then Ashley,ANTH,12 appear in the later part of the file; in this case, Ashley has taken a total of 16 courses from the Anthropology department.

Part 2: Storing the Data

Now that we understand the format of our data, the next thing you should do is decide how you want to structure the data you read within your code. There are multiple approaches that will work fine and some that are less good – think about how you will want to access the data in order to write each function!

Task 1: Complete the load_data function, which takes in the name of a CSV file and should return your structured data.

Note: You can convert a str to an int with the int function; for instance, int("4") == 4.

Part 3: Using the Data

Let’s put our data to use! For these analysis functions, you don’t need to worry about CSV files — each one takes in your structured data.

Note: You may find that once you start writing your analysis functions, you want to change the way your data are structured –– this is expected! Keep in mind, though, that every analysis function needs to use the same input data structure.

Task 2: What is the department with the most taken courses? Write a function most_taken that returns the name of the department with the highest total number of courses taken by all students.

Task 3: Which student has the widest-ranging course set (i.e. has taken courses from the biggest number of departments)? Write a function widest_ranging that returns the name of the student who has taken courses from the maximum number of departments.

Task 4: How many total courses did each student take, on average? Write a function average_courses that returns the average number of courses taken per student.

Task 5: Which departments have had only one student take courses in them? Write a function only_once that returns a list of departments in which only one student took a course.

Part 5: Running Your Code

To run your code, you can use the provided courses.py file. If you want to run the average_courses function on the courses.csv file, you would run the following command:

$ python3 courses.py courses.csv average_courses

Note: Things to run in your terminal will be marked with a $ symbol. Only the code that follows the $ should be run in the terminal.

You can check how we run your code by looking at the main block in courses.py. This block of code will run your functions and print the results.

Part 4: Testing Your Code

Task 6: Write tests for each of the functions you implemented in courses.py. You can find the test file in courses_test.py. No need to write tests for the load_data function.

With Pytest, tests are written as Python functions in a testing file (test_*.py or *_test.py). For example, to test your only_once function, you’d write something like this:

def test_only_once():
    test_data = { ... }
    assert only_once(test_data) == [ ... ]

Remember that each analysis function should be tested with a variety of inputs, including edge cases. For example, you might want to test what happens when there are no students in the data, or when there are no departments.

Run your tests with the following command:

$ pytest courses_test.py

Note: Make sure all the files are in the same directory (folder) before running the tests! Otherwise, you may get a scary FileNotFoundError.

Part 5: README

A README is a text file that contains information about your code. For our purposes, instead of writing comments in your code, we will ask you to write a README that answers some questions about your implementation.

You should answer the following questions in your README:

Submission

Please follow the design and clarity guide–part of your grade will be for code style and clarity. Additionally, you should be adhering to the course design recipe. After completing the homework, you will submit:

Please don’t put your name in your code files, as we grade anonymously. If you have any questions about the assignment, please post on Ed.

You can only use a maximum of 3 late days per assignment. If the assignment is late (and you do NOT have anymore late days) no credit will be given.