Computer for coding and accessing

Ground rules
For this assignment, you may use any of our EE 559 course materials, and you may use your
computer for coding and accessing the assignment, EE 559 materials, piazza, and D2L. You may
also use the internet for information about Python, Python coding, and Python libraries that you
are allowed to use for the problem you are working on (e.g., looking up information for sklearn
functions that you might want to use in a problem that allows the sklearn library).
You are not allowed to use the internet to look up answers or partial answers to any of this
assignment’s problems, or communicate with other people (e.g., other students, people on
discussion forums, etc.) on topics related to this assignment.
For all coding in this assignment, you are allowed to use built-in Python functions, NumPy,
matplotlib (and pandas only for reading/writing csv files). Other libraries are allowed where
stated so in the problem.
For questions on this assignment, when posting on piazza, please consider whether this is a
question that is appropriate for all students (e.g., clarifying the problem statement, what is
allowed, suspected typos or omissions in the problem statement), or only for the instructors (e.g.,
something that includes your approach or solution) which should be posted as a “private” post
that is visible only to the professor, TAs and you.
Please respect your classmates and follow all of these guidelines, so that everyone can be graded
fairly. Any violations that are detected will result in substantial penalties.
Uploading your solutions. Please upload the following files before the due date and time:

  1. A single pdf file of all your solutions/answers.
  2. A single computer-readable pdf file of all your code.
    Problems and points. This Midterm Assignment has 3 problems, worth a total of 100 points
    possible. There will be partial credit on most problem parts. Good luck!
    p. 2 of 5
  3. Kernel nearest means.
    In this problem you will write code and run a nonlinear version of a nearest means classifier,
    on data that we provide. This problem has � = 2 features.
    Coding: you must use Python, and code the functions yourself. sklearn and other
    supplementary libraries are not allowed for this problem. Similar to Homework 4, you may
    use Python built-in functions, NumPy, and matplotlib. You may also use plotting functions
    provided in our previous homework assignments, and modify them as you like. Pandas may
    be used only for reading and writing csv files.
    Datasets: there are 2 synthetic datasets provided. Pr1_dataset1 is the synthetic dataset from
    HW1, but divided up to include a validation set. Pr1_dataset2 is also synthetic but is new.
    Each of these 2 datasets is given as 3 sets: training set, validation set, and test set.
    Comment: This problem uses model selection with a validation set (covered in Lecture 13);
    the procedure is straightforward, and we will answer any questions you have regarding the
    model selection procedure.
    (a) Write an expression for the discriminant function �%�' for a 2-class nearest-means
    classifier. No need to derive it; but do express it in simplest form.
    (b) From (a), develop and give the dual representation of the discriminant function; call it
    �!%�'.
    Hint: Lecture 12 for the multiclass (MVM) version.
    (c) Use kernel substitution to give an expression for the discriminant function �"#$%�' for
    a nearest-means classifier using an RBF kernel, with parameter γ (as defined in Lecture
    12).
    (d) Code a 2-class kernel nearest means classifier that can use RBF kernel or linear kernel.
    Use model selection with the validation set to find an optimal value of � (call it �∗) in
    the case of RBF kernel, separately for each dataset.
    Tip: not knowing what range is best for � a priori, you might first try a range covering
    a few orders of magnitude, and use a log scale for the increment: e.g., to cover
    0.01 ≤ � ≤ 100, one could use a loop over the index k in the range −2 ≤ � ≤ 2 with
    some increment (step size) specified for k. Then the value of � for each k can be
    computed from � = 10& .
    (e) Plot the validation-set classification error as a function of γ, for each dataset, for RBF
    kernel. (If you used a log scale for your increments as described in the tip of (d), then
    use a log scale for � in your plot; or equivalently, plot vs. k instead of vs. �.) Pick the
    optimal value �∗ for each dataset based on validation-set classification error.
    (f) Compare test-set error using the linear kernel with test-set error using the RBF kernel,
    for each dataset. Comment on the results.
    (g) For the linear kernel, plot the training data, decision regions and boundary, in the
    feature space, for each dataset.
    p. 3 of 5
    (h) For the RBF kernel with optimal γ, plot the training data and decision regions in the
    original feature space, for each dataset.
    Tip: any point in the original feature space can be classified (using nearest-means with
    RBF kernel) from your code.
    (i) For the RBF kernel, repeat part (h) except for
    � = 0.01�∗, 0.1�∗, 0.3�∗, 3�∗, 10�∗, 100�∗. Comment on the results.
  4. Learning algorithm from criterion function.
    In this problem you will derive and interpret a learning algorithm based on the following
    criterion function in a 2-class problem, using augmented notation:
    �%�' = −567�%�'�'' ≤ 09:
    (
    ')*
    �+�' �'
    ∥ � ∥,
    in which 7[… ]9 denotes the indicator function.
    (a) Interpret the meaning of the criterion function; use a figure if it helps explain it.
    Tip: just stating what is different from Perceptron criterion function is not sufficient.
    (b) Derive a gradient descent (GD) algorithm for each GD method given below. Simplify
    your answers algebraically. Give a statement of the entire algorithm.
    Give both:
    (i) Batch GD
    (ii) Stochastic GD (Variant 1)
    (c) In this part use nonaugmented notation. Consider a problem with � = 2 features.
    Draw a plot in 2D weight space (�, ��. �*), for the case �- = 0, showing items (i)-(v)
    below. Label your axes with numbers and tick marks, so the lines and vectors drawn
    are unambiguous.
    Let �(�) = *
    ./ . For the first epoch, the data has already been shuffled so take the data points in the order given (�, �,, �0, �1).
    Tip: This plot can be done by hand. If you prefer to do it by computer, that is also OK.
    (i) The following data points (as lines �2!3!%�' = 0 ), with an arrow pointing to the
    correctly classified side.
    :   � = [−4,1]+,   �, = [−3, −2]+
    �,:   �0 = [−1,3]+,   �1 = [3, −2]+
    (ii) The weight vector at the second iteration (� = 1) :
    �(1) = [−1,4]+.
    (iii) The vector showing the weight update at iteration � = 1 (based on data point �,).
    Also state the vector’s value in numbers.