Statistical Machine Learning GR5241

Assignment Questions

Statistical Machine Learning GR5241
Spring 2021
Homework 4
Homework submission: Please submit your homework electronically through Canvas by 11:59pm on the due
date. You need to submit both the pdf le and your code (either in R or Python).
Problem 1 (Boosting, 50 points)
The objective of this problem is to implement the AdaBoost algorithm. We will test the algorithm on handwritten
digits from the USPS data set.
AdaBoost: Assume we are given a training sample (x(i); yi); i = 1; :::; n, where x(i) are data values in Rd and
yi 2 f?1; +1g are class labels. Along with the training data, we provide the algorithm with a training routine for
some classier c (the weak learner”). Here is the AdaBoost algorithm for the two-class problem:
1. Initialize weights: wi = 1
n
2. for b = 1; :::;B
(a) Train a weak learner cb on the weighted training data. (See the note below)
Note: step 2(a) can be completed using two dierent methods: (1) use the weight vector directly in
the training of the weak learner, or (2) use the weight vector to sample data points with replacement
from the original data, then train the weak learner on the sampled data. Either way will guarantee full
credit.
(b) Compute error: b :=
Pn
i=1 wiIfyi6=cb(x(i) P )g n
i=1 wi
(c) Compute voting weights: b = log

1?b
b

or b = 1
2 log

1?b
b

(d) Recompute weights: wi = wi exp
?
bIfyi 6= cb(x(i))g

3. Return classier ^cB(x(i)) = sgn
PB
b=1 bcb(x(i))

Decision stumps: Recall that a stump classier c is dened by
c(xjj; ;m) :=
(
+m xj >
?m otherwise.
(1)
Since the stump ignores all entries of x except xj , it is equivalent to a linear classier dened by an ane
hyperplane. The plane is orthogonal to the jth axis, with which it intersects at xj = . The orientation of the
hyperplane is determined by m 2 f?1; +1g. We will employ stumps as weak learners in our boosting algorithm.
To train stumps on weighted data, use the learning rule
(j; ) := arg min
j;
Pn
i=1 wiIfyi 6= c(x(i)jj; ;m)g Pn
i=1 wi
: (2)
In the implementation of your training routine, rst determine an optimal parameter
j for each dimension
j = 1; :::; d, and then select the j for which the cost term in (2) is minimal.
Note: If the data is sampled using weights wi, then the decision stump can be trained using a loss function other
than weighted 0-1 loss. Using other loss functions to train the weak learner is technically not correct; however,
the results are similar.
Homework problems:
1. Implement the AdaBoost algorithm in R.
2. Run your algorithm on the USPS data (use the training and test data for the 3s and 8s) and evaluate your
results using cross validation.
More precisely: Your AdaBoost algorithm returns a classier that is a combination of B weak learners.
Since it is an incremental algorithm, we can evaluate the AdaBoost at every iteration b by considering the
sum up to the b-th weak learner. At each iteration, perform 5-fold cross validation to estimate the training
and test error of the current classier (that is, the errors measured on the cross validation training and test
sets, respectively).
3. Plot the training error and the test error as a function of b.
Submission. Please make sure your solution contains the following:
Your implementation of AdaBoost.
Plots of your results (training error and cross-validated test error).
Problem 2 (Basic Theory Related to the Lasso [30 points])
2.i Consider the univariate lasso objective function with no bias:
Q() =
1
2n
Xn
i=1
(yi ? xi)2 + jj
Also suppose x is scaled using the formula
xi :=
xi
1
n
Pn
i=1 x2i
; i = 1; 2; : : : ; n:
Derive a closed form expression for the lasso solution, i.e., show Q() is minimized at
^ =
8><
>:
1
n
P
i xiyi ? if 1
n
P
i xiyi >
0 if ? 1
n
P
i xiyi
1
n
P
i xiyi + if 1
n
P
i xiyi < ?
2.ii Consider the multivariate lasso objective function with no bias:
Q() = Q(1; 2; : : : ; p) =
1
2n
Xn
i=1

yi ?
Xp
j=1
jxij
2
+
Xp
j=1
jj j
Also suppose that the jth feature xj is scaled using the formula
xij :=
xij
1
n
Pn
i=1 x2
ij
; i = 1; 2; : : : ; n; j = 1; 2; : : : ; p:
Solve the expression @Q
@j
= 0 for j . Your nal answer should be
j = S

1
n
xTj
r(j)
i

;
where xj is the jth feature, r(j) is the partial residual, and S is the soft thresholding operator. See the
lecture slides for further details.
2
Problem 3 (Regression Trees, ISL 8.4 [20 points])
3.i Sketch the tree corresponding to the partition of the predictor space illustrated in the left-hand panel of
Figure 1. The numbers inside the boxes indicate the mean of Y within each region.
3.ii Create a diagram similar to the left-hand panel of Figure 1, using the tree illustrated in the right-hand panel
of the same gure. You should divide up the predictor space into the correct regions, and indicate the mean
for each region.
Figure 1: Left: A partition of the predictor space corresponding to part a. Right: A tree corresponding to part b.
3

Continue to order Get a quote

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.