Keshawn_lu's Blog

吴恩达团队NLP C2_W4_Assignment

字数统计: 862阅读时长: 4 min
2021/08/08 Share

吴恩达团队NLP C2_W4_Assignment

任务:计算单词嵌入并用于情感分析

Part1:The Continuous bag of words model(CBOW)

在这个模型下,我们给出上下文的单词,并尝试判断中间词。

如有个字符串为:I am happy because I am learning

我们设C = 2 ,当我们要预测happy时,则:

  • 𝐶 words before: [I, am]
  • 𝐶 words after: [because, I]

模型结构图如下:

https://pic.imgdb.cn/item/610e38c55132923bf83f26f0.png

其中$\bar x$为:

https://pic.imgdb.cn/item/610e38bd5132923bf83f08ab.png

公式如下:

1.1 初始化模型

我们将初始化W1, W2矩阵和b1, b2向量:

  • W1:N * V
  • W2:V * N
  • b1:N * 1
  • b2:V * 1

其中V为单词数,N为单词向量维数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def initialize_model(N,V, random_seed=1):
'''
Inputs:
N: dimension of hidden vector
V: dimension of vocabulary
random_seed: random seed for consistent results in the unit tests
Outputs:
W1, W2, b1, b2: initialized weights and biases
'''

np.random.seed(random_seed)

# W1 has shape (N,V)
W1 = np.random.rand(N, V)
# W2 has shape (V,N)
W2 = np.random.rand(V, N)
# b1 has shape (N,1)
b1 = np.random.rand(N, 1)
# b2 has shape (V,1)
b2 = np.random.rand(V, 1)

return W1, W2, b1, b2

1.2 softmax

1
2
3
4
5
6
7
8
9
10
11
12
def softmax(z):
'''
Inputs:
z: output scores from the hidden layer
Outputs:
yhat: prediction (estimate of y)
'''

# Calculate yhat (softmax)
yhat = np.exp(z) / np.sum(np.exp(z), axis=0) # axis=0 将一列中的元素相加

return yhat

1.3 前向传播

通过以下三个公式实现正向传播:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def forward_prop(x, W1, W2, b1, b2):
'''
Inputs:
x: average one hot vector for the context
W1, W2, b1, b2: matrices and biases to be learned
Outputs:
z: output score vector
'''

h = np.dot(W1, x) + b1
h = np.maximum(0, h)

z = np.dot(W2, h) + b2

return z, h

1.4 反向传播

https://pic.imgdb.cn/item/610e3bcb5132923bf84a9014.png

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def back_prop(x, yhat, y, h, W1, W2, b1, b2, batch_size):
'''
Inputs:
x: average one hot vector for the context
yhat: prediction (estimate of y)
y: target vector
h: hidden vector (see eq. 1)
W1, W2, b1, b2: matrices and biases
batch_size: batch size
Outputs:
grad_W1, grad_W2, grad_b1, grad_b2: gradients of matrices and biases
'''

# Compute l1 as W2^T (Yhat - Y)
l1 = np.dot(W2.T, yhat - y)
l1 = np.maximum(0, l1) # relu

grad_W1 = np.dot(l1, x.T)
grad_W1 = 1 / batch_size * grad_W1

grad_W2 = 1 / batch_size * np.dot(yhat - y, h.T)

grad_b1 = 1 / batch_size * np.sum(l1, axis=1, keepdims=True) # 保持维数特性

grad_b2 = 1 / batch_size * np.sum(yhat - y ,axis=1, keepdims=True)

return grad_W1, grad_W2, grad_b1, grad_b2

1.5 梯度下降

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def gradient_descent(data, word2Ind, N, V, num_iters, alpha=0.03):

'''
This is the gradient_descent function

Inputs:
data: text
word2Ind: words to Indices
N: dimension of hidden vector
V: dimension of vocabulary
num_iters: number of iterations
Outputs:
W1, W2, b1, b2: updated matrices and biases

'''
W1, W2, b1, b2 = initialize_model(N,V, random_seed=282)
batch_size = 128
iters = 0
C = 2
for x, y in get_batches(data, word2Ind, V, C, batch_size):
z, h = forward_prop(x, W1, W2, b1, b2)

yhat = softmax(z)

cost = compute_cost(y, yhat, batch_size)

if ( (iters+1) % 10 == 0):
print(f"iters: {iters + 1} cost: {cost:.6f}")

# Get gradients
grad_W1, grad_W2, grad_b1, grad_b2 = back_prop(x, yhat, y, h, W1, W2, b1, b2, batch_size)

# Update weights and biases
W1 = W1 - grad_W1 * alpha
W2 = W2 - grad_W2 * alpha
b1 = b1 - grad_b1 * alpha
b2 = b2 - grad_b2 * alpha

iters += 1
if iters == num_iters:
break
if iters % 100 == 0:
alpha *= 0.66

return W1, W2, b1, b2

1.6 可视化单词向量(由于不是在官网上编程,所以结果图可能会有误差)

1
2
3
4
5
6
7
8
9
10
11
from matplotlib import pyplot
%config InlineBackend.figure_format = 'svg'
words = ['king', 'queen','lord','man', 'woman','dog','wolf',
'rich','happy','sad']

embs = (W1.T + W2)/2.0

# given a list of words and the embeddings, it returns a matrix with all the embeddings
idx = [word2Ind[word] for word in words]
X = embs[idx, :]
print(X.shape, idx) # X.shape: Number of words of dimension N each
1
(10, 50) [2744, 3949, 2960, 3022, 5672, 1452, 5671, 4189, 2315, 4276]
1
2
3
4
5
result= compute_pca(X, 2)
pyplot.scatter(result[:, 0], result[:, 1])
for i, word in enumerate(words):
pyplot.annotate(word, xy=(result[i, 0], result[i, 1]))
pyplot.show()

https://pic.imgdb.cn/item/610e3dce5132923bf850a2f1.png

1
2
3
4
5
result= compute_pca(X, 4)
pyplot.scatter(result[:, 3], result[:, 1])
for i, word in enumerate(words):
pyplot.annotate(word, xy=(result[i, 3], result[i, 1]))
pyplot.show()

https://pic.imgdb.cn/item/610e3e085132923bf8513346.png

CATALOG
  1. 1. 吴恩达团队NLP C2_W4_Assignment
    1. 1.1. 任务:计算单词嵌入并用于情感分析
    2. 1.2. Part1:The Continuous bag of words model(CBOW)
      1. 1.2.1. 1.1 初始化模型
      2. 1.2.2. 1.2 softmax
      3. 1.2.3. 1.3 前向传播
      4. 1.2.4. 1.4 反向传播
      5. 1.2.5. 1.5 梯度下降
      6. 1.2.6. 1.6 可视化单词向量(由于不是在官网上编程,所以结果图可能会有误差)