吴恩达团队NLP C3_W2_Assignment

字数统计: 1.4k阅读时长: 7 min

 2021/08/28   Share

吴恩达团队NLP C3_W2_Assignment

任务：探索递归神经网络RNN

Part1：将一行字符串中的字符都转化为unicode整数，将其称之为tensor(张量)

def line_to_tensor(line, EOS_int=1):
    """Turns a line of text into a tensor

    Args:
        line (str): A single line of text.
        EOS_int (int, optional): End-of-sentence integer. Defaults to 1.

    Returns:
        list: a list of integers (unicode values) for the characters in the `line`.
    """
    
    tensor = []
    for c in line:
        c_int = ord(c)
        tensor.append(c_int)
    
    tensor.append(1) # 代表结束

    return tensor

1.2 实现批处理数据生成器

将文本行转化为整数的numpy数组，为了使其都有相同的长度，并用整数0进行填充。

def data_generator(batch_size, max_length, data_lines, line_to_tensor=line_to_tensor, shuffle=True):
    """Generator function that yields batches of data

    Args:
        batch_size (int): number of examples (in this case, sentences) per batch.
        max_length (int): maximum length of the output tensor.
        NOTE: max_length includes the end-of-sentence character that will be added
                to the tensor.  
                Keep in mind that the length of the tensor is always 1 + the length
                of the original line of characters.
        data_lines (list): list of the sentences to group into batches.
        line_to_tensor (function, optional): function that converts line to tensor. Defaults to line_to_tensor.
        shuffle (bool, optional): True if the generator should generate random batches of data. Defaults to True.

    Yields:
        tuple: two copies of the batch (jax.interpreters.xla.DeviceArray) and mask (jax.interpreters.xla.DeviceArray).
        NOTE: jax.interpreters.xla.DeviceArray is trax's version of numpy.ndarray
    """
    index = 0
    
    cur_batch = []
    
    # 一共有几行文本
    num_lines = len(data_lines)
    
    # create an array with the indexes of data_lines that can be shuffled
    lines_index = [*range(num_lines)]
    
    if shuffle:
        rnd.shuffle(lines_index)
    
    while True:
        if index >= num_lines:
            index = 0
            if shuffle:
                rnd.shuffle(lines_index)
        
        line = data_lines[lines_index[index]]
        
        if len(line) < max_length:
            cur_batch.append(line)
     
        index += 1
        
        if len(cur_batch) == batch_size:
            
            batch = []
            mask = []
            
            for li in cur_batch:
                tensor = line_to_tensor(li) # 文本行转换为张量

                # 填充
                pad = [0] * (max_length - len(tensor))
                tensor_pad = tensor + pad
                
                batch.append(tensor_pad)

                # A mask for tensor_pad is 1 wherever tensor_pad is not
                # 0 and 0 wherever tensor_pad is 0, i.e. if tensor_pad is
                # [1, 2, 3, 0, 0, 0] then example_mask should be
                # [1, 1, 1, 0, 0, 0]
                # Hint: Use a list comprehension for this
                example_mask = [0 if t == 0 else 1 for t in tensor_pad]
                mask.append(example_mask)
               
            # 转换为numpy数组
            batch_np_arr = np.array(batch)
            mask_np_arr = np.array(mask)
            
            
            # 输入 目标 掩码
		   # 第二个返回值与第一个相同，用于评估
            yield batch_np_arr, batch_np_arr, mask_np_arr
            
            # 重置
            cur_batch = []

1.3 重复批处理生成器

在训练期间对数据集进行多次循环，使用itertools.cycle进行实现

import itertools

infinite_data_generator = itertools.cycle(
    data_generator(batch_size=2, max_length=10, data_lines=tmp_lines))

Part2：定义GRU模型

tl.ShiftRight：允许模型在前馈中向右移动
tl.Embedding：初始化嵌入
tl.GRU：构建传统GRU
tl.Dense：密集层
tl.LogSoftmax：输出概率的对数

def GRULM(vocab_size=256, d_model=512, n_layers=2, mode='train'):
    """Returns a GRU language model.

    Args:
        vocab_size (int, optional): Size of the vocabulary. Defaults to 256.
        d_model (int, optional): Depth of embedding (n_units in the GRU cell). Defaults to 512.
        n_layers (int, optional): Number of GRU layers. Defaults to 2.
        mode (str, optional): 'train', 'eval' or 'predict', predict mode is for fast inference. Defaults to "train".

    Returns:
        trax.layers.combinators.Serial: A GRU language model as a layer that maps from a tensor of tokens to activations over a vocab set.
    """

    model = tl.Serial(
      tl.ShiftRight(mode=mode),
      tl.Embedding(vocab_size=vocab_size, d_feature=d_model), 
      [tl.GRU(n_units=d_model) for _ in range(n_layers)], 
      tl.Dense(n_units=vocab_size), 
      tl.LogSoftmax()
    )
    return model

Part3：训练模型

trax.supervised.training.TrainTask：将训练数据、损失、优化器等打包到一个对象中
- labeled_data：需要训练的带标签的数据
- loss_fn：损失函数
- 优化器
trax.supervised.training.EvalTask：将评估数据和度量进行打包
- labeled_data：需要训练的带标签的数据
- metrics：度量
trax.supervised.training.Loop：将所有事物放到一起进行训练

from trax.supervised import training

def train_model(model, data_generator, batch_size=32, max_length=64, lines=lines, eval_lines=eval_lines, n_steps=1, output_dir='model/'): 
    """Function that trains the model

    Args:
        model (trax.layers.combinators.Serial): GRU model.
        data_generator (function): Data generator function.
        batch_size (int, optional): Number of lines per batch. Defaults to 32.
        max_length (int, optional): Maximum length allowed for a line to be processed. Defaults to 64.
        lines (list, optional): List of lines to use for training. Defaults to lines.
        eval_lines (list, optional): List of lines to use for evaluation. Defaults to eval_lines.
        n_steps (int, optional): Number of steps to train. Defaults to 1.
        output_dir (str, optional): Relative path of directory to save model. Defaults to "model/".

    Returns:
        trax.supervised.training.Loop: Training loop for the model.
    """
    
	# 生成训练数据
    bare_train_generator = data_generator(batch_size, max_length, data_lines=lines)

	# 循环训练,多次迭代
    infinite_train_generator = itertools.cycle(bare_train_generator)
    
	#评估数据
    bare_eval_generator = data_generator(batch_size, max_length, data_lines=eval_lines)
    infinite_eval_generator = itertools.cycle(bare_eval_generator)
   
    train_task = training.TrainTask(
        labeled_data=infinite_train_generator, 
        loss_layer=tl.CrossEntropyLoss(), 
        optimizer=trax.optimizers.Adam(0.0005)
    )

    eval_task = training.EvalTask(
        labeled_data=infinite_eval_generator,
        metrics=[tl.CrossEntropyLoss(), tl.Accuracy()], 
        n_eval_batches=3
    )
    
    training_loop = training.Loop(model,
                                  train_task,
                                  eval_tasks=eval_task,  # trax==1.3.9 参数为eval_tasks
                                  output_dir=output_dir)

    training_loop.run(n_steps=n_steps)
    
    
    # We return this because it contains a handle to the model, which has the weights etc.
    return training_loop

Part4：评估

使用困惑度来衡量概率模型预测样本的能力：

$P(W) = \sqrt[N]{\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,...,w_{n-1})}}$

取对数：

$log P(W) = {log\big(\sqrt[N]{\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,...,w_{n-1})}}\big)}\\ = {log\big({\prod_{i=1}^{N} \frac{1}{P(w_i| w_1,...,w_{n-1})}}\big)^{\frac{1}{N}}}\\ \\ = {log\big({\prod_{i=1}^{N}{P(w_i| w_1,...,w_{n-1})}}\big)^{-\frac{1}{N}}} \\ = -\frac{1}{N}{log\big({\prod_{i=1}^{N}{P(w_i| w_1,...,w_{n-1})}}\big)} \\ = -\frac{1}{N}{\big({\sum_{i=1}^{N}{logP(w_i| w_1,...,w_{n-1})}}\big)}$

tl.one_hot：将目标转换为与预测张量相同的维度
以下代码并没有完全看懂…

def test_model(preds, target):
    """Function to test the model.

    Args:
        preds (jax.interpreters.xla.DeviceArray): Predictions of a list of batches of tensors corresponding to lines of text.
        target (jax.interpreters.xla.DeviceArray): Actual list of batches of tensors corresponding to lines of text.

    Returns:
        float: log_perplexity of the model.
    """

    total_log_ppx = np.sum(preds * tl.one_hot(target, preds.shape[-1]), axis= -1)

    non_pad = 1.0 - np.equal(target, 0)    
    
    ppx = total_log_ppx * non_pad

    log_ppx = np.sum(ppx) / np.sum(non_pad)
    
    return -log_ppx

Part5：vanilla RNNs与GRUs的前馈

5.1 `vanilla RNNs`参照下图左边：

def forward_V_RNN(inputs, weights):
    x, h_t = inputs

    # weights.
    wh, _, _, bh, _, _ = weights

    # new hidden state
    h_t = np.dot(wh, np.concatenate([h_t, x])) + bh
    h_t = sigmoid(h_t)

    return h_t, h_t

5.2 GRUs的前馈参考上图右边

$\begin{equation} \Gamma_r=\sigma{(W_r[h^{<t-1>}, x^{<t>}]+b_r)} \end{equation}$ $\begin{equation} \Gamma_u=\sigma{(W_u[h^{<t-1>}, x^{<t>}]+b_u)} \end{equation}$ $\begin{equation} c^{<t>}=\tanh{(W_h[\Gamma_r*h^{<t-1>},x^{<t>}]+b_h)} \end{equation}$ $\begin{equation} h^{<t>}=\Gamma_u*c^{<t>}+(1-\Gamma_u)*h^{<t-1>} \end{equation}$

def forward_GRU(inputs, weights):
    x, h_t = inputs

    # weights.
    wu, wr, wc, bu, br, bc = weights

    u = np.dot(wu, np.concatenate([h_t, x])) + bu
    u = sigmoid(u)
    
    # Relevance gate
    r = np.dot(wr, np.concatenate([h_t, x])) + br
    r = sigmoid(u)
    
    # Candidate hidden state 
    c = np.dot(wc, np.concatenate([r * h_t, x])) + bc
    c = np.tanh(c)
    
    # New Hidden state h_t
    h_t = u* c + (1 - u)* h_t
    return h_t, h_t

Next Post

吴恩达团队NLP C3_W3_Assignment
Previous Post

吴恩达团队NLP C3_W1_Assignment

CATALOG

1. 吴恩达团队NLP C3_W2_Assignment



Total : 424

2023

2022

2021

2020

2019

缺失模块。
1、请确保node版本大于6.2
2、在博客根目录（注意不是archer根目录）执行以下命令：
npm i hexo-generator-json-content --save
3、在根目录_config.yml里添加配置：

jsonContent:
  meta: false
  pages: false
  posts:
    title: true
    date: true
    path: true
    text: false
    raw: false
    content: false
    slug: false
    updated: false
    comments: false
    link: false
    permalink: false
    excerpt: false
    categories: true
    tags: true

吴恩达团队NLP C3_W2_Assignment

吴恩达团队NLP C3_W2_Assignment

任务：探索递归神经网络RNN

Part1：将一行字符串中的字符都转化为unicode整数，将其称之为tensor(张量)

1.2 实现批处理数据生成器

1.3 重复批处理生成器

Part2：定义GRU模型

Part3：训练模型

Part4：评估

Part5：vanilla RNNs与GRUs的前馈

5.1 vanilla RNNs参照下图左边：

5.2 GRUs的前馈参考上图右边

5.1 `vanilla RNNs`参照下图左边：