CS7150 Homework 1

Problem 1

Consider a 2 dimensional random Gaussian vector,

(10’) Derive the joint entropy H(x)
(10’) Derive the mutual information I(x1; x2)
(10’) If ρ can be varied, when is the joint entropy maximized? what is the mutual information between x1 and x2 then?

Problem 2

We talked about cross-entropy in the class. That is,

In a C-class image classification problem, we have N samples. Denote the i-th sample as (xi, yi), where xi is image and yi ∈ {0, 1, ... ,C − 1} is its class label. Suppose our model predicts class probability as ˆpi(y) where y ∈ {0, ... ,C − 1}.

(10’) Show that (an empirical estimate of) cross-entropy is
Let us denote the above cross-entropy loss as ℓ. We often report ℓ as an indicator of model quality. A model with lower ℓ is more accurate. However, in some applications, we also report a related metric, called perplexity (PPL), defined as, Show that:

1. (5’) A perfect model has PPL = 1.
1. (5’) Any reasonable model should have PPL < C.

Problem 3

Install numpy, matplotlib and pytorch/tensorflow in your laptop/desktop. Attach your code for all sub-problems below.

(10’) Plot the following function, on mesh grid defined upon −1 ≤ x1, x2 ≤ 1.
(10’) Use pytorch/tensorflow’s autograd ability to get the gradient at x=(1, -1)
(30’) Write a program (using pytorch/tensorflow’s autograd) to search for the minimizer of f(x). Call it x∗. Also report the corresponding f(x∗).

Wechat

QQ

Telegram

CS7150 Homework 1

Problem 1

Problem 2

Problem 3