Home

gentlesnow

22 Nov 2018

【NLP】【CS224N】4 word window 分类与神经网络

Overview Today

  1. Classification background
  2. Updating word vectors for classification
  3. Window classification & cross entropy error derivation tips
  4. A single layer neural network
  5. Max-Margin loss and backprop

Details of the softmax

dataset:${X,y}$

softmax分类器预测的概率为

\[p(y\|x)=softmax(W_y,x)=\frac{exp(W_y,x)}{\sum_{c=1}^{C}{exp(W_c,x)}}\]

信息论中,KL散度(KL divergence)是一种用来衡量两个分布的差异的方法,那么也就是说我们想让预测分布和真实分布的KL散度最小。

交叉熵误差

\[H(p, q) = -\sum\limits_{c=1}^{C}{p(c)logq(c)}\]

dataset:

\[{x_i, y_i}_{i=1}^{N}\]

regularization

Til next time,
gentlesnow at 10:08

scribble