To help with the proofs, here's a short proof primer.
(30 points)
You’ve just started as a member of Wien Energie’s data science team. As your first assignment they give you the following problem. They’d like to send out a survey asking customers about what devices they own. They hope that by looking at the results of this survey they can classify customers into high and low power users. The idea is to develop a machine learning algorithm to classify the customers.
Your manager knows that from the \(d\) items in the survey, there are \(k\) items that use alot of power but not which ones. If the customer has at least one of these items then they will be a high power user. The dataset you are given has information for each customer \(i\) consisting of \(x_i \subset \{1,\ldots,d\}\). In the dataset \(t_i = 1\) iff the customer is a high power user. In addition, each customer can own at most \(s\) items. Help your boss to design a learning algorithm such that its time complexity is polynomial in \(s\) and \(k\).
What representation would you use for each observation \(x_i\) in memory?
Write down your algorithm given this representation. Hint: think about how you would find the distinguishing features quickly
Justify why this algorithm can take polynomial time by referring to specific steps in your algorithm.
(20 points)
Recall that we discussed how a feed-forward neural network can be viewed as a generalized linear model with adaptive basis functions \(\phi(\cdot)\) (slide 6 of the Neural network lecture). A feed-forward neural network with a single hidden layer is implements the function:
\[ y_k(\mathbf{x},\mathbf{w}) = h \left( \sum_{j=1}^M w_{kj}^{(2)} h \left( \sum_{i=1}^D w_{ji}^{(1)} x_i + w_{j0}^{(1)} \right) + w_{k0}^{(2)} \right) \]
where \(h(\cdot)\) is the activation function (see slide 10 and 11 of the Neural network lecture.
Show that a feed-forward NN is a linear classifier with adaptable basis functions. In other words, show that \(h \left( \sum_{j=1}^M w_{kj}^{(2)} h \left( \sum_{i=1}^D w_{ji}^{(1)} x_i + w_{j0}^{(1)} \right) + w_{k0}^{(2)} \right) = f \left( \sum_{j=1}^M w_j \phi_j(\mathbf{x}) \right)\)
Which parts of the neural network implement the basis functions?
(20 points)
Let’s say you’re using some code you found to do 6-sided dice rolls. You run it a number of times and it always says “3” no matter how many times you run it. This seems suspicious but you can use Bayes rule to help determine the liklihood the code is working as it should (i.e. it produces fair die rolls where each side is equally likely). For the sake of argument you can assume that both the fair case and unfair case are equally likely.
Write down what is the joint likelihood, after seeing \(n\) rolls, of seeing a sequence of “3”s for both the case where the code will always return a 3 and where it produces “fair” dice rolls.
What is the probability of the code being unfair (always returns 3) given you have seen \(n\) 3s so far?
How many rolls do you need before you have 99% confidence that the code is indeed unfair?
(30 points)
In the case of a logistic sigmoid (as defined on slides 67/68 of our Linear Classification lecture) show that the \(a\) simplifies to a linear function under the assumption that the class-conditional densities \(p(\mathbf{x} | C_k)\) are Gaussians, and have the same covariance matrix \(\Sigma\):
\[ a = \mathrm{ln} \frac{p(\mathbf{x} | C_1) p(C_1)}{p(\mathbf{x} | C_2) p(C_2)} = \mathbf{w}^T \mathbf{x} + w_0 \]
(see slide 73 of our Linear Classification lecture).
Please submit a PDF-document with your answers to Moodle. Use the following naming scheme for your submission: “lastname_matrikelnumber_A2.pdf”. The naming of the files is important. If you do not follow the submission instructions, then you will receive a grade of 0 for the assignment.