## Description

## Objectives:

– Understand how to iterate through a list

– Practice implementing functions that return values

– Explore algorithmic decision-making and representations of people in data

– Understanding outliers, missing information in data, and human judgements in

algorithmic decisions

Turn in:

– hw6_lastname_code.ipynb

You should turn this file in on Canvas under the “Homework 6 – College Admissions Algorithms”

link. Complete the following checklist before turning in your code:

○ all cells in your file run without errors after restarting the kernel

○ you have no global variables (i.e., all your variables are inside functions)

○ your code is nicely commented

○ you have appropriately renamed the starter notebook

○ you have set the variable filename = “superheroes_tiny.csv”

## Introduction:

As you know, the college admissions process involves a lot of types of data from prospective

students to make decisions. With the number of applicants increasing, colleges may begin

relying on algorithms to select which applications should receive more intensive human review.

An algorithm could use quantitative data–such as GPA and SAT score–to provide initial

recommendations. In fact, there is more data available than ever. Many colleges even track data

about prospective student engagement – e.g., whether they open emails, visit the college

website, engage on social media, etc. This creates a “demonstrated interest” value.

Based on a recent survey of college admissions officers, we know some of the weights that

humans tend to give to these different types of data. Your task will be to create a program that

iterates through a list of data points and provides a recommendation for which prospective

students are likely to be the best candidates for admission.

Prospective student data is organized in the superheroes.csv file such that the data for each

student is on one line, with the values separated by tabs. An example of two students’ data

might be:

Student,SAT,GPA,Interest,High School Quality,Sem1,Sem2,Sem3,Sem4

Abbess Horror ,1300,3.61,10,7,95,86,91,94

Adele Hawthorne ,1400,3.67,0,9,97,83,85,86

Adelicia von Krupp ,900,4,5,2,88,92,83,72

The data includes (in order):

Student: a unique identifier for each student (their string name)

SAT score: value between 0 and 1600

GPA: value between 0 and 5

Interest: value between 0 and 10 (from very low interest to very high interest)

High School Quality: value between 0 and 10 (from very low-quality to very high-quality high

school curriculum)

Sem1: average grade for semester 1

Sem2: average grade for semester 2

Sem3: average grade for semester 3

Sem4: average grade for semester 4

## Problem 1: Getting set up with our data (15 points)

First, we need to make sure that we can appropriately read in the data line by line, parsing each

line into a list and converting each element to the appropriate type.

Task 1: read in your data set in main(), looping through its contents line by line.

This data

contains one row of headers at the beginning, so isolate the headers before reading through the

rest of the file so you don’t try to do calculations with them!

Make use of the str.split(delimiter) function to break individual lines into a list of elements. Print

the contents of your list after using the split function. You’ll delete this print statement later but

make sure to double check this before moving on! Once you have each line in a list, save the

student’s name in a variable, then delete this element from your list.

Task 2: Once you have a list of strings for each line, you will write a function convert_row_type

that takes one list of elements (representing the data for one student) as a parameter and

converts it so that all numbers are converted to floats. Make sure not to lose any information

when you do this conversion!

Example:

Input: [“1300″,”3.61″,”10″,”7″,”95″,”86″,”91″,”94”]

Return value: [1300.0,3.61,10.0,7.0,95.0,86.0,91.0,94.0]

Task 3: in main, once you’ve called convert_row_type on the list representing one row, call the

provided check_row_type. If this function returns False, print out an error message. Ensure that

none of the rows in your data return False when passed to this function.

Task 4: separate your data. Use list slicing or while loops to separate your list (which should

contain 8 numbers at this point) into two lists: one that contains the student’s SAT, GPA, Interest,

and High School Quality scores, and one that contains their 4 semester grades.

You’ll do

Problems 2 – 5 with the first list of 4 numbers and Problem 6 with the list of grades.

## Problem 2: Prospective Student Score (10 points)

Task 1: Write a function calculate_score that takes a list as a parameter and calculates an

overall score for each student based on the following weights: 40% GPA, 30% SAT, 20%

strength of curriculum, 10% demonstrated interest. The list parameter will contain all of the

relevant information about the student.

The return value is the student’s calculated score.

To make this work, you will also need to normalize both GPA and SAT so that they are also on a

0 to 10 scale. To do this, multiply the GPA by 2, and divide the SAT score by 160.

Example:

Input: [1300.0,3.61,10.0,7.0]

which represents a student with a 1300 SAT score, a 3.61 GPA, 10 out of 10 for interest and 7

out of 10 for high school quality

((3.61 * 2) * 0.4) + ((1300 / 160) * 0.3) + (10 * 0.1) + (7 * 0.2) = 7.73 out of 10

Output: 7.73

Round each output score to 2 decimal points.

Task 2: In your main() function, modify your loop that reads in and converts your data to call the

calculate_score function for each line (row) of data (after you’ve converted it). Then, write the

student’s name and their calculated score to a new file called student_scores.csv such that

each row contains a student’s name and their score, separated by a comma.

Example:

Abbess Horror ,1300,3.61,10,7,95,86,91,94

Adele Hawthorne ,1400,3.67,0,9,97,83,85,86

Adelicia von Krupp ,900,4,5,2,88,92,83,72

lines written to file:

Abbess Horror ,7.73

Adele Hawthorne ,7.36

Adelicia von Krupp ,5.79

Task 3: Write all the student names for all students who have a score of 6 or higher to a file

called chosen_students.txt. You should do this in your main() function, where you have access

to the returned calculated score for each student and their student name.

Example:

Abbess Horror ,1300,3.61,10,7,95,86,91,94

Adele Hawthorne ,1400,3.67,0,9,97,83,85,86

Adelicia von Krupp ,900,4,5,2,88,92,83,72

lines written to file:

Abbess Horror

Adele Hawthorne

## Problem 3: Looking for Outliers (10 points)

Consider ways that this algorithm might systematically miss certain kinds of edge cases. For

example, what if a student has a 0 for demonstrated interest because they don’t use social

media or have access to a home computer? What if a student has a very high GPA but their

SAT score is low enough to bring their score down; could this mean that they had a single bad

test taking day?

Task 1: Write a function is_outlier that can check for certain kinds of outliers. It should check

for: (1) demonstrated interest scores of 0 and (2) a normalized GPA that is more than 2 points

higher than the normalized SAT score. If either of these conditions is true, it should return True

(because this student is an outlier); otherwise, the function returns False.

Task 2: Call is_outlier for each student from your main() function and write the students’ names

to a file called outliers.txt, one name per line if they are an outlier.

Task 3: Combine the work that you’ve done now to create an improved list of students to admit

to your school. Write the student names, one per line to the file chosen_improved.txt if they

either have a score of 6 or greater or if they are an outlier and their score is 5 or greater. Make

sure to take advantage of the work that you’ve already done by calling your functions from

previous problems to help you out!

## Problem 4: GPA Checker (10 points)

A single GPA score is not a full picture of a student’s academic performance, as it may have

improved over time or included outlier courses or semesters. A more context-sensitive algorithm

could consider a student’s entire transcript and check for, for example, a single class score that

is more than two letter grades (20 points) lower than all other scores. For this task, you will use

the second half of the data for each student in the provided file.

Task 1: Write a function grade_outlier that takes in a list of grades (of any length at least 2) and

returns True if the lowest number is more than 20 points lower than all other numbers;

otherwise, False. This function must not modify the list passed in!

Example:

Input: [99, 94, 87, 89, 56, 78, 89]

Hint: Sort the list from lowest to highest, and check for the difference between the two lowest

grades.

78 – 56 = 22; 22 > 20

Output: True

Next, consider the data that we have: a list of grades for each student, one grade per semester

for four semesters.

Make sure that your grade_outlier function works by calling it for every row, passing in the

grades list you isolated in problem 1, task 4. Print out an informative message about which

students have a single grade outlier. You’ll delete this later but it’s a great way of testing your

function!

Finally, consider the importance of an algorithm being able to flag students who might have a

lower overall GPA but have shown improvement over time.

Task 2: Create a function grade_improvement that returns True if the score of each semester

is higher than each previous semester and False otherwise. Hint: investigate how the ==

operator works between two lists and think about using the sorted() function.

Task 3: Using the grade information that you’ve just learned, create your own conditions based

on the information from the previous problems and grade_outlier and grade_improvement to

chose all students if they either have a score of 6 or greater or if they have a score of 5 or more

and at least one of the following is true: is_outlier returns True, grade_outlier returns True, or

grade_improvement returns true. Write the students who fit this description to

extra_improved_chosen.txt, one name per line.

Note!

We will be testing your program using a file with the same format but different data!

Make sure that you haven’t “hard coded” anything specific to your data. We do not

guarantee that all scenarios are tested by the code that we have provided you.

#### Comments & style (5 points)

Your variable names must be informative.

You must have one main() function that is structured like a “table of contents” and use

parameters and returns appropriately. All code except your call to main must be contained within

a function.

Your code must be commented. You must include a file comment, inline comments, and function

comments. An example function comment is shown below:

def fahrenheit_to_celsius(fahrenheit):

“””

This function converts a temperature in fahrenheit to celsius

and prints the equivalence.

Parameters:

fahrenheit (int or float): degrees fahrenheit

Return:

celsius (float): equivalent degrees celsius

“””

celsius = (fahrenheit – 32) * (5/9)

return celsius