Assignment 1: Python intro and Regression analysis

Due date: Oct 31, 2018

For all questions please use Python. We have provided a template file to use called l1p1.py which you must use for the assignment. Please use the correct function signatures for the submission.

Part 1 (15 points)

This part of the assignment is designed to get you up and running with Python. Please place all code for this assignment in a single file called ‘l1p1.py’. No external libraries, code, or modules may be used except where the assignment specifically mentions them.

Q1.1 (5 points)

Write a function, fibonacci(i), to compute the i-th Fibonacci number. It should take a single integer value as argument and return a single integer.

Here are some examples of use:

>>> fibonacci(1)
1

>>> fibonacci(0)
1

>>> fibonacci(100)
354224848179261915075

Q1.2 (5 points)

For this question you can use the csv module included with python. Write a function, extract(input_file, output_file, columns), that reads a csv file and then writes out a new csv file to disk that contains only the listed columns. input_file and output_file are path strings that specify the file locations and columns is a list of strings containing the column names.

Example:

>>> extract("input.csv", "output.csv", ["c1", "c2", "c3"])

Q3.3 (5 points)

For this questions you can also use the csv module included with python. Write a function, col_stats(input_file, column), that reads a csv file from disk and returns a tuple of the min, max, average, sum, and absolute sum of values in the specified column. The tuple should be in the form (minvalue, maxvalue, average, sum, abssum).

Examples:

>>> col_stats("input.csv", "c1")
(-2, 2, 0, 0, 6)

>>> col_stats("input2.csv", "x2")
(-4.5, 7.23, 1.345, 3.67, 24.33)

Part 2 (85 points)

For this part we’re going to explore the Boston housing dataset (csv file). The idea is to predict the median home price based on 13 factors. For these parts submit a single file of your python code called l1p2.ipynb. Please clearly comment the file so we can understand which parts of the code are associated with which project. For the writeup, also submit a pdf called l1p2.pdf. The Jupyter notebook file should produce this pdf. See below for directions on how to do this. It is your responsibility to ensure that this can be run on the grader's computer. We will use a python distribution set up with scikit-learn, numpy, and matplotlib.

Q2.1 (35 points)

First we want to get an overview of the data. Write a report that contains the following parts:

  1. Use matplotlib to make a scatterplot of each variable against the median home price.
  2. Explain the trends you see in each of the plots. Explain if you see any trend and what kind of trend you see (linear, quadratic, logrithmic, etc). (one sentence for each plot should be sufficient)
  3. before doing the later parts of the assignment, explain which factors you think are important for predicting the median home price and why. (one paragraph)

Q2.2 (50 points)

Now it’s time to build a model of the data. For this you should use the KernelRidge module from scipy.

For the evaulation report both the R2 value as well as the 5-fold cross validation results for each model.

  1. Build a linear model and evaluate it (HINT: linear kernel)
  2. Build a polynomial model and evaluate it (HINT: polynomial kernel)
  3. Build a Gaussian model and evaluate it (HINT: Gaussian/RBF kernel)
  4. Compare the 3 different models:
    • which degree of polynomial provided the best fit?
    • which values of gamma and alpha for the fitting procedure gave the best fit?
    • write which model is the best fit to the data and why (1 paragraph will be sufficient)

Converting a Jupyter notebook to pdf

Jupyter comes with a function called nbconvert that can convert Juyter notebooks to a variety of formats. To convert your notebook to pdf you'll also need LaTeX and pandoc installed. Once you've done this you can convert the notebook from the command line:

>>> jupyter nbconvert --to pdf l1p2.ipynb

Procedure and Submission

Please submit a ZIP-document with your answers to Moodle. Use the following naming scheme for your submission: “lastname_matrikelnumber_A1.zip”. The naming of the files is important. If you do not follow the submission instructions, then you will receive a grade of 0 for the lab.

Late submission

Late Submissions are NOT possible. Any assignment submitted late will receive zero points.

Academic Honesty