CS7150 Homework 1
Problem 1
Consider a 2 dimensional random Gaussian vector,
- (10’) Derive the joint entropy H(x)
- (10’) Derive the mutual information I(x1; x2)
- (10’) If ρ can be varied, when is the joint entropy maximized? what is the mutual information between x1 and x2 then?
Problem 2
We talked about cross-entropy in the class. That is,
In a C-class image classification problem, we have N samples. Denote the i-th sample as (xi, yi), where xi is image and yi ∈ {0, 1, ... ,C − 1} is its class label. Suppose our model predicts class probability as ˆpi(y) where y ∈ {0, ... ,C − 1}.
- (10’) Show that (an empirical estimate of) cross-entropy is
- Let us denote the above cross-entropy loss as ℓ. We often report ℓ as an indicator of model quality. A model with lower ℓ is more accurate. However, in some applications, we also report a related metric, called perplexity (PPL), defined as, Show that:
-
- (5’) A perfect model has PPL = 1.
-
- (5’) Any reasonable model should have PPL < C.
Problem 3
Install numpy, matplotlib and pytorch/tensorflow in your laptop/desktop. Attach your code for all sub-problems below.
- (10’) Plot the following function, on mesh grid defined upon −1 ≤ x1, x2 ≤ 1.
- (10’) Use pytorch/tensorflow’s autograd ability to get the gradient at x=(1, -1)
- (30’) Write a program (using pytorch/tensorflow’s autograd) to search for the minimizer of f(x). Call it x∗. Also report the corresponding f(x∗).