Noise constrastive estimation
TLDR The paper proposed a method to estimate the probability density function of a dataset by discriminating observed data and noise drawn from a distribution. The paper setups the problem into a dataset of \(T\) observations \((x_1, … x_T)\) drawn from a true distribution \(p_d(.)\). We then try to approximate \(p_d\) by a parameterized function \(p_m(.;\theta)\). The estimator \(\hat{\theta}_T\) is defined to be the \(\theta\) that maximize function $$ J_T(\theta) = \frac{1}{2T}\sum_t{\log[h(x_t; 0)]} + \log[1-h(y_t; \theta)] $$...