吴恩达团队NLP C2_W4_Assignment

吴恩达团队NLP

字数统计: 862阅读时长: 4 min

 2021/08/08   Share

吴恩达团队NLP C2_W4_Assignment

任务：计算单词嵌入并用于情感分析

Part1：The Continuous bag of words model（CBOW）

在这个模型下，我们给出上下文的单词，并尝试判断中间词。

如有个字符串为：I am happy because I am learning

我们设C = 2 ，当我们要预测happy时，则：

𝐶 words before: [I, am]
𝐶 words after: [because, I]

模型结构图如下：

其中$\bar x$为：

公式如下：

$\begin{align} h &= W_1 \ X + b_1 \\ a &= ReLU(h) \\ z &= W_2 \ a + b_2 \\ \hat y &= softmax(z) \\\end{align}$

1.1 初始化模型

我们将初始化W1, W2矩阵和b1, b2向量：

W1：N * V
W2：V * N
b1：N * 1
b2：V * 1

其中V为单词数，N为单词向量维数。

def initialize_model(N,V, random_seed=1):
    '''
    Inputs: 
        N:  dimension of hidden vector 
        V:  dimension of vocabulary
        random_seed: random seed for consistent results in the unit tests
     Outputs: 
        W1, W2, b1, b2: initialized weights and biases
    '''
    
    np.random.seed(random_seed)

    # W1 has shape (N,V)
    W1 = np.random.rand(N, V)
    # W2 has shape (V,N)
    W2 = np.random.rand(V, N)
    # b1 has shape (N,1)
    b1 = np.random.rand(N, 1)
    # b2 has shape (V,1)
    b2 = np.random.rand(V, 1)

    return W1, W2, b1, b2

1.2 softmax

$\text{softmax}(z_i) = \frac{e^{z_i} }{\sum_{i=0}^{V-1} e^{z_i} }$

def softmax(z):
    '''
    Inputs: 
        z: output scores from the hidden layer
    Outputs: 
        yhat: prediction (estimate of y)
    '''
    
    # Calculate yhat (softmax)
    yhat = np.exp(z) / np.sum(np.exp(z), axis=0)  # axis=0 将一列中的元素相加
    
    return yhat

1.3 前向传播

通过以下三个公式实现正向传播：

$\begin{align} h &= W_1 \ X + b_1 \\ a &= ReLU(h) \\ z &= W_2 \ a + b_2 \\ \end{align}$

def forward_prop(x, W1, W2, b1, b2):
    '''
    Inputs: 
        x:  average one hot vector for the context 
        W1, W2, b1, b2:  matrices and biases to be learned
     Outputs: 
        z:  output score vector
    '''
    
    h = np.dot(W1, x) + b1
    h = np.maximum(0, h)
    
    z = np.dot(W2, h) + b2

    return z, h

1.4 反向传播

def back_prop(x, yhat, y, h, W1, W2, b1, b2, batch_size):
    '''
    Inputs: 
        x:  average one hot vector for the context 
        yhat: prediction (estimate of y)
        y:  target vector
        h:  hidden vector (see eq. 1)
        W1, W2, b1, b2:  matrices and biases  
        batch_size: batch size 
     Outputs: 
        grad_W1, grad_W2, grad_b1, grad_b2:  gradients of matrices and biases   
    '''
    
    # Compute l1 as W2^T (Yhat - Y)
    l1 = np.dot(W2.T, yhat - y)
    l1 = np.maximum(0, l1) # relu
    
    grad_W1 = np.dot(l1, x.T)
    grad_W1 = 1 / batch_size * grad_W1
    
    grad_W2 = 1 / batch_size * np.dot(yhat - y, h.T)
    
    grad_b1 = 1 / batch_size * np.sum(l1, axis=1, keepdims=True) # 保持维数特性
    
    grad_b2 = 1 / batch_size * np.sum(yhat - y ,axis=1, keepdims=True)
    
    return grad_W1, grad_W2, grad_b1, grad_b2

1.5 梯度下降

def gradient_descent(data, word2Ind, N, V, num_iters, alpha=0.03):
    
    '''
    This is the gradient_descent function
    
      Inputs: 
        data:      text
        word2Ind:  words to Indices
        N:         dimension of hidden vector  
        V:         dimension of vocabulary 
        num_iters: number of iterations  
     Outputs: 
        W1, W2, b1, b2:  updated matrices and biases   

    '''
    W1, W2, b1, b2 = initialize_model(N,V, random_seed=282)
    batch_size = 128
    iters = 0
    C = 2
    for x, y in get_batches(data, word2Ind, V, C, batch_size):
        z, h = forward_prop(x, W1, W2, b1, b2)

        yhat = softmax(z)

        cost = compute_cost(y, yhat, batch_size)
        
        if ( (iters+1) % 10 == 0):
            print(f"iters: {iters + 1} cost: {cost:.6f}")
            
        # Get gradients
        grad_W1, grad_W2, grad_b1, grad_b2 = back_prop(x, yhat, y, h, W1, W2, b1, b2, batch_size)
        
        # Update weights and biases
        W1 = W1 - grad_W1 * alpha 
        W2 = W2 - grad_W2 * alpha 
        b1 = b1 - grad_b1 * alpha 
        b2 = b2 - grad_b2 * alpha 
        
        iters += 1 
        if iters == num_iters: 
            break
        if iters % 100 == 0:
            alpha *= 0.66
            
    return W1, W2, b1, b2

1.6 可视化单词向量（由于不是在官网上编程，所以结果图可能会有误差）

from matplotlib import pyplot
%config InlineBackend.figure_format = 'svg'
words = ['king', 'queen','lord','man', 'woman','dog','wolf',
         'rich','happy','sad']

embs = (W1.T + W2)/2.0
 
# given a list of words and the embeddings, it returns a matrix with all the embeddings
idx = [word2Ind[word] for word in words]
X = embs[idx, :]
print(X.shape, idx)  # X.shape:  Number of words of dimension N each

1	(10, 50) [2744, 3949, 2960, 3022, 5672, 1452, 5671, 4189, 2315, 4276]

result= compute_pca(X, 2)
pyplot.scatter(result[:, 0], result[:, 1])
for i, word in enumerate(words):
    pyplot.annotate(word, xy=(result[i, 0], result[i, 1]))
pyplot.show()

result= compute_pca(X, 4)
pyplot.scatter(result[:, 3], result[:, 1])
for i, word in enumerate(words):
    pyplot.annotate(word, xy=(result[i, 3], result[i, 1]))
pyplot.show()

Next Post

吴恩达团队NLP C3_W1_Assignment
Previous Post

吴恩达团队NLP C2_W3_Assignment

CATALOG

1. 吴恩达团队NLP C2_W4_Assignment
1. 1.1. 任务：计算单词嵌入并用于情感分析
2. 1.2. Part1：The Continuous bag of words model（CBOW）



Total : 424

2023

2022

2021

2020

2019

缺失模块。
1、请确保node版本大于6.2
2、在博客根目录（注意不是archer根目录）执行以下命令：
npm i hexo-generator-json-content --save
3、在根目录_config.yml里添加配置：

jsonContent:
  meta: false
  pages: false
  posts:
    title: true
    date: true
    path: true
    text: false
    raw: false
    content: false
    slug: false
    updated: false
    comments: false
    link: false
    permalink: false
    excerpt: false
    categories: true
    tags: true