22 Nov 2018
【NLP】【CS224N】4 word window 分类与神经网络
Overview Today
- Classification background
- Updating word vectors for classification
- Window classification & cross entropy error derivation tips
- A single layer neural network
- Max-Margin loss and backprop
Details of the softmax
dataset:${X,y}$
softmax分类器预测的概率为
\[p(y\|x)=softmax(W_y,x)=\frac{exp(W_y,x)}{\sum_{c=1}^{C}{exp(W_c,x)}}\]信息论中,KL散度(KL divergence)是一种用来衡量两个分布的差异的方法,那么也就是说我们想让预测分布和真实分布的KL散度最小。
交叉熵误差
\[H(p, q) = -\sum\limits_{c=1}^{C}{p(c)logq(c)}\]dataset:
\[{x_i, y_i}_{i=1}^{N}\]regularization
Til next time,
gentlesnow
at 10:08
