欧美free性护士vide0shd,老熟女,一区二区三区,久久久久夜夜夜精品国产,久久久久久综合网天天,欧美成人护士h版

首頁綜合正文

評論

柚子快報激活碼778899分享：人工智能自然語言處理前饋網絡

Rakuten品質軒綜合2025-05-05470

柚子快報激活碼778899分享：人工智能自然語言處理前饋網絡

http://yzkb.51969.com/

一、實驗介紹

1. 實驗內容

在實驗3中，我們通過觀察感知器來介紹神經網絡的基礎，感知器是現(xiàn)存最簡單的神經網絡。感知器的一個歷史性的缺點是它不能學習數據中存在的一些非常重要的模式。例如，查看圖4-1中繪制的數據點。這相當于非此即彼(XOR)的情況，在這種情況下，決策邊界不能是一條直線(也稱為線性可分)。在這個例子中，感知器失敗了。

圖4-1 XOR數據集中的兩個類繪制為圓形和星形。請注意，沒有任何一行可以分隔這兩個類。

在這一實驗中，我們將探索傳統(tǒng)上稱為前饋網絡的神經網絡模型，以及兩種前饋神經網絡:多層感知器和卷積神經網絡。多層感知器在結構上擴展了我們在實驗3中研究的簡單感知器，將多個感知器分組在一個單層，并將多個層疊加在一起。我們稍后將介紹多層感知器，并在“示例:帶有多層感知器的姓氏分類”中展示它們在多層分類中的應用。

本實驗研究的第二種前饋神經網絡，卷積神經網絡，在處理數字信號時深受窗口濾波器的啟發(fā)。通過這種窗口特性，卷積神經網絡能夠在輸入中學習局部化模式，這不僅使其成為計算機視覺的主軸，而且是檢測單詞和句子等序列數據中的子結構的理想候選。我們在“卷積神經網絡”中概述了卷積神經網絡，并在“示例:使用CNN對姓氏進行分類”中演示了它們的使用。

在本實驗中，多層感知器和卷積神經網絡被分組在一起，因為它們都是前饋神經網絡，并且與另一類神經網絡——遞歸神經網絡(RNNs)形成對比，遞歸神經網絡(RNNs)允許反饋(或循環(huán))，這樣每次計算都可以從之前的計算中獲得信息。在實驗6和實驗7中，我們將介紹RNNs以及為什么允許網絡結構中的循環(huán)是有益的。

在我們介紹這些不同的模型時，需要理解事物如何工作的一個有用方法是在計算數據張量時注意它們的大小和形狀。每種類型的神經網絡層對它所計算的數據張量的大小和形狀都有特定的影響，理解這種影響可以極大地有助于對這些模型的深入理解。

2. 實驗要點

通過“示例:帶有多層感知器的姓氏分類”，掌握多層感知器在多層分類中的應用掌握每種類型的神經網絡層對它所計算的數據張量的大小和形狀的影響

3. 實驗環(huán)境

Python 3.6.7

4. 附件目錄

請將本實驗所需數據文件**(surnames.csv)上傳至目錄：/data/surnames/**.示例完整代碼：exp4-In-Text-Examples.ipynbexp4-munging_surname_dataset.ipynbexp4-2D-Perceptron-MLP.ipynbexp4_4_Classify_Surnames_CNN.ipynbexp4_4_Classify_Surnames_MLP.ipynb

二、The Multilayer Perceptron（多層感知器）

多層感知器(MLP)被認為是最基本的神經網絡構建模塊之一。最簡單的MLP是對第3章感知器的擴展。感知器將數據向量作為輸入，計算出一個輸出值。在MLP中，許多感知器被分組，以便單個層的輸出是一個新的向量，而不是單個輸出值。在PyTorch中，正如您稍后將看到的，這只需設置線性層中的輸出特性的數量即可完成。MLP的另一個方面是，它將多個層與每個層之間的非線性結合在一起。

最簡單的MLP，如圖4-2所示，由三個表示階段和兩個線性層組成。第一階段是輸入向量。這是給定給模型的向量。在“示例:對餐館評論的情緒進行分類”中，輸入向量是Yelp評論的一個收縮的one-hot表示。給定輸入向量，第一個線性層計算一個隱藏向量——表示的第二階段。隱藏向量之所以這樣被調用，是因為它是位于輸入和輸出之間的層的輸出。我們所說的“層的輸出”是什么意思?理解這個的一種方法是隱藏向量中的值是組成該層的不同感知器的輸出。使用這個隱藏的向量，第二個線性層計算一個輸出向量。在像Yelp評論分類這樣的二進制任務中，輸出向量仍然可以是1。在多類設置中，將在本實驗后面的“示例:帶有多層感知器的姓氏分類”一節(jié)中看到，輸出向量是類數量的大小。雖然在這個例子中，我們只展示了一個隱藏的向量，但是有可能有多個中間階段，每個階段產生自己的隱藏向量。最終的隱藏向量總是通過線性層和非線性的組合映射到輸出向量。

圖4-2 一種具有兩個線性層和三個表示階段（輸入向量、隱藏向量和輸出向量)的MLP的可視化表示

mlp的力量來自于添加第二個線性層和允許模型學習一個線性分割的的中間表示——該屬性的能表示一個直線(或更一般的,一個超平面)可以用來區(qū)分數據點落在線(或超平面)的哪一邊的。學習具有特定屬性的中間表示，如分類任務是線性可分的，這是使用神經網絡的最深刻后果之一，也是其建模能力的精髓。在下一節(jié)中，我們將更深入地研究這意味著什么。

2.1 A Simple Example: XOR

讓我們看一下前面描述的XOR示例，看看感知器與MLP之間會發(fā)生什么。在這個例子中，我們在一個二元分類任務中訓練感知器和MLP:星和圓。每個數據點是一個二維坐標。在不深入研究實現(xiàn)細節(jié)的情況下，最終的模型預測如圖4-3所示。在這個圖中，錯誤分類的數據點用黑色填充，而正確分類的數據點沒有填充。在左邊的面板中，從填充的形狀可以看出，感知器在學習一個可以將星星和圓分開的決策邊界方面有困難。然而，MLP(右面板)學習了一個更精確地對恒星和圓進行分類的決策邊界。

圖4-3 從感知器(左)和MLP(右)學習的XOR問題的解決方案顯示

圖4-3中，每個數據點的真正類是該點的形狀:星形或圓形。錯誤的分類用塊填充，正確的分類沒有填充。這些線是每個模型的決策邊界。在邊的面板中，感知器學習—個不能正確地將圓與星分開的決策邊界。事實上，沒有一條線可以。在右動的面板中，MLP學會了從圓中分離星。

雖然在圖中顯示MLP有兩個決策邊界，這是它的優(yōu)點，但它實際上只是一個決策邊界!決策邊界就是這樣出現(xiàn)的，因為中間表示法改變了空間，使一個超平面同時出現(xiàn)在這兩個位置上。在圖4-4中，我們可以看到MLP計算的中間值。這些點的形狀表示類(星形或圓形)。我們所看到的是，神經網絡(本例中為MLP)已經學會了“扭曲”數據所處的空間，以便在數據通過最后一層時，用一線來分割它們。

圖4-4 MLP的輸入和中間表示是可視化的。從左到右:（1）網絡的輸入;（2）第一個線性模塊的輸出;（3）第一個非線性模塊的輸出;（4）第二個線性模塊的輸出。第一個線性模塊的輸出將圓和星分組，而第二個線性模塊的輸出將數據點重新組織為線性可分的。

相反，如圖4-5所示，感知器沒有額外的一層來處理數據的形狀，直到數據變成線性可分的。

圖4-5 感知器的輸入和輸出表示。因為它沒有像MLP那樣的中間表示來分組和重新組織，所以它不能將圓和星分開。

2.2 Implementing MLPs in PyTorch

在上一節(jié)中，我們概述了MLP的核心思想。在本節(jié)中，我們將介紹PyTorch中的一個實現(xiàn)。如前所述，MLP除了實驗3中簡單的感知器之外，還有一個額外的計算層。在我們在例4-1中給出的實現(xiàn)中，我們用PyTorch的兩個線性模塊實例化了這個想法。線性對象被命名為fc1和fc2，它們遵循一個通用約定，即將線性模塊稱為“完全連接層”，簡稱為“fc層”。除了這兩個線性層外，還有一個修正的線性單元(ReLU)非線性(在實驗3“激活函數”一節(jié)中介紹)，它在被輸入到第二個線性層之前應用于第一個線性層的輸出。由于層的順序性，必須確保層中的輸出數量等于下一層的輸入數量。使用兩個線性層之間的非線性是必要的，因為沒有它，兩個線性層在數學上等價于一個線性層4，因此不能建模復雜的模式。MLP的實現(xiàn)只實現(xiàn)反向傳播的前向傳遞。這是因為PyTorch根據模型的定義和向前傳遞的實現(xiàn)，自動計算出如何進行向后傳遞和梯度更新。

Example 4-1. Multilayer Perceptron

import torch.nn as nn

import torch.nn.functional as F

class MultilayerPerceptron(nn.Module):

def __init__(self, input_dim, hidden_dim, output_dim):

"""

Args:

input_dim (int): the size of the input vectors

hidden_dim (int): the output size of the first Linear layer

output_dim (int): the output size of the second Linear layer

"""

super(MultilayerPerceptron, self).__init__()

self.fc1 = nn.Linear(input_dim, hidden_dim) # 第一個全連接層，將input_dim轉換為hidden_dim

self.fc2 = nn.Linear(hidden_dim, output_dim) # 第二個全連接層，將hidden_dim轉換為output_dim

def forward(self, x_in, apply_softmax=False): #MLP模型的前向傳播函數

"""The forward pass of the MLP

Args:

x_in (torch.Tensor): an input data tensor.

x_in.shape should be (batch, input_dim)

apply_softmax (bool): a flag for the softmax activation

should be false if used with the Cross Entropy losses

Returns:

the resulting tensor. tensor.shape should be (batch, output_dim)

"""

intermediate = F.relu(self.fc1(x_in)) # 通過第一個全連接層后，應用ReLU激活函數

output = self.fc2(intermediate) # 通過第二個全連接層得到最終輸出

if apply_softmax:

output = F.softmax(output, dim=1) # 如果需要，在輸出上應用Softmax激活函數

return output

在例4-2中，我們實例化了MLP。由于MLP實現(xiàn)的通用性，可以為任何大小的輸入建模。為了演示，我們使用大小為3的輸入維度、大小為4的輸出維度和大小為100的隱藏維度。請注意，在print語句的輸出中，每個層中的單元數很好地排列在一起，以便為維度3的輸入生成維度4的輸出。

Example 4-2. An example instantiation of an MLP

batch_size = 2 # number of samples input at once

input_dim = 3 # 輸入向量的維度大小

hidden_dim = 100 # 隱藏層的維度大小，即第一個全連接層的輸出維度

output_dim = 4 # 輸出層的維度大小，即模型的輸出維度

# Initialize model

mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim) # 初始化多層感知機（MLP）模型

print(mlp)

MultilayerPerceptron(

(fc1): Linear(in_features=3, out_features=100, bias=True)

(fc2): Linear(in_features=100, out_features=4, bias=True)

)

我們可以通過傳遞一些隨機輸入來快速測試模型的“連接”，如示例4-3所示。因為模型還沒有經過訓練，所以輸出是隨機的。在花費時間訓練模型之前，這樣做是一個有用的完整性檢查。請注意PyTorch的交互性是如何讓我們在開發(fā)過程中實時完成所有這些工作的，這與使用NumPy或panda沒有太大區(qū)別:

Example 4-3. Testing the MLP with random inputs

import torch

def describe(x):

print("Type: {}".format(x.type()))

print("Shape/size: {}".format(x.shape))

print("Values: \n{}".format(x))

x_input = torch.rand(batch_size, input_dim) # 生成一個隨機張量，其中元素值在 [0, 1) 之間

describe(x_input)

Type: torch.FloatTensor

Shape/size: torch.Size([2, 3])

Values:

tensor([[8.4721e-01, 9.5659e-01, 6.0349e-01],

[2.9170e-01, 9.0718e-04, 1.1416e-01]])

上述代碼運行結果：

Type: torch.FloatTensor

Shape/size: torch.Size([2, 3])

Values:

tensor([[0.6193, 0.7045, 0.7812],

[0.6345, 0.4476, 0.9909]])

y_output = mlp(x_input, apply_softmax=False)

describe(y_output)

Type: torch.FloatTensor

Shape/size: torch.Size([2, 4])

Values:

tensor([[ 0.4593, 0.2138, -0.0132, 0.0921],

[ 0.1068, 0.2340, 0.0007, -0.0869]], grad_fn=)

上述代碼運行結果：

Type: torch.FloatTensor

Shape/size: torch.Size([2, 4])

Values:

tensor([[ 0.2356, 0.0983, -0.0111, -0.0156],

[ 0.1604, 0.1586, -0.0642, 0.0010]], grad_fn=)

學習如何讀取PyTorch模型的輸入和輸出非常重要。在前面的例子中，MLP模型的輸出是一個有兩行四列的張量。這個張量中的行與批處理維數對應，批處理維數是小批處理中的數據點的數量。列是每個數據點的最終特征向量。在某些情況下，例如在分類設置中，特征向量是一個預測向量。名稱為“預測向量”表示它對應于一個概率分布。預測向量會發(fā)生什么取決于我們當前是在進行訓練還是在執(zhí)行推理。在訓練期間，輸出按原樣使用，帶有一個損失函數和目標類標簽的表示。我們將在“示例:帶有多層感知器的姓氏分類”中對此進行深入介紹。

但是，如果想將預測向量轉換為概率，則需要額外的步驟。具體來說，需要softmax函數，它用于將一個值向量轉換為概率。softmax有許多根。在物理學中，它被稱為玻爾茲曼或吉布斯分布;在統(tǒng)計學中，它是多項式邏輯回歸;在自然語言處理(NLP)社區(qū)，它是最大熵(MaxEnt)分類器。不管叫什么名字，這個函數背后的直覺是，大的正值會導致更高的概率，小的負值會導致更小的概率。在示例4-3中，apply_softmax參數應用了這個額外的步驟。在例4-4中，可以看到相同的輸出，但是這次將apply_softmax標志設置為True:

Example 4-4. MLP with apply_softmax=True

y_output = mlp(x_input, apply_softmax=True)

describe(y_output)

Type: torch.FloatTensor

Shape/size: torch.Size([2, 4])

Values:

tensor([[0.3227, 0.2525, 0.2012, 0.2236],

[0.2591, 0.2943, 0.2331, 0.2135]], grad_fn=)

上述代碼運行結果：

Type: torch.FloatTensor

Shape/size: torch.Size([2, 4])

Values:

tensor([[0.2915, 0.2541, 0.2277, 0.2267],

[0.2740, 0.2735, 0.2189, 0.2336]], grad_fn=)

綜上所述，mlp是將張量映射到其他張量的線性層。在每一對線性層之間使用非線性來打破線性關系，并允許模型扭曲向量空間。在分類設置中，這種扭曲應該導致類之間的線性可分性。另外，可以使用softmax函數將MLP輸出解釋為概率，但是不應該將softmax與特定的損失函數一起使用，因為底層實現(xiàn)可以利用高級數學/計算捷徑。

三、實驗步驟

3.1 Example: Surname Classification with a Multilayer Perceptron

在本節(jié)中，我們將MLP應用于將姓氏分類到其原籍國的任務。從公開觀察到的數據推斷人口統(tǒng)計信息(如國籍)具有從產品推薦到確保不同人口統(tǒng)計用戶獲得公平結果的應用。人口統(tǒng)計和其他自我識別信息統(tǒng)稱為“受保護屬性”?！霸诮：彤a品中使用這些屬性時，必須小心?！蔽覀兪紫葘γ總€姓氏的字符進行拆分，并像對待“示例:將餐館評論的情緒分類”中的單詞一樣對待它們。除了數據上的差異，字符層模型在結構和實現(xiàn)上與基于單詞的模型基本相似.

應該從這個例子中吸取的一個重要教訓是，MLP的實現(xiàn)和訓練是從我們在第3章中看到的感知器的實現(xiàn)和培訓直接發(fā)展而來的。事實上，我們在實驗3中提到了這個例子，以便更全面地了解這些組件。此外，我們不包括“例子:餐館評論的情緒分類”中看到的代碼。

本節(jié)的其余部分將從姓氏數據集及其預處理步驟的描述開始。然后，我們使用詞匯表、向量化器和DataLoader類逐步完成從姓氏字符串到向量化小批處理的管道。如果你通讀了實驗3，應該知道，這里只是做了一些小小的修改。

我們將通過描述姓氏分類器模型及其設計背后的思想過程來繼續(xù)本節(jié)。MLP類似于我們在實驗3中看到的感知器例子，但是除了模型的改變，我們在這個例子中引入了多類輸出及其對應的損失函數。在描述了模型之后，我們完成了訓練例程。訓練程序與“示例:對餐館評論的情緒進行分類”非常相似，因此為了簡潔起見，我們在這里不像在該部分中那樣深入，可以回顧這一節(jié)內容。

3.1.1 The Surname Dataset

姓氏數據集，它收集了來自18個不同國家的10,000個姓氏，這些姓氏是作者從互聯(lián)網上不同的姓名來源收集的。該數據集將在本課程實驗的幾個示例中重用，并具有一些使其有趣的屬性。第一個性質是它是相當不平衡的。排名前三的課程占數據的60%以上:27%是英語，21%是俄語，14%是阿拉伯語。剩下的15個民族的頻率也在下降——這也是語言特有的特性。第二個特點是，在國籍和姓氏正字法(拼寫)之間有一種有效和直觀的關系。有些拼寫變體與原籍國聯(lián)系非常緊密(比如“O ‘Neill”、“Antonopoulos”、“Nagasawa”或“Zhu”)。

為了創(chuàng)建最終的數據集，我們從一個比課程補充材料中包含的版本處理更少的版本開始，并執(zhí)行了幾個數據集修改操作。第一個目的是減少這種不平衡——原始數據集中70%以上是俄文，這可能是由于抽樣偏差或俄文姓氏的增多。為此，我們通過選擇標記為俄語的姓氏的隨機子集對這個過度代表的類進行子樣本。接下來，我們根據國籍對數據集進行分組，并將數據集分為三個部分:70%到訓練數據集，15%到驗證數據集，最后15%到測試數據集，以便跨這些部分的類標簽分布具有可比性。

SurnameDataset的實現(xiàn)與“Example: classification of Sentiment of Restaurant Reviews”中的ReviewDataset幾乎相同，只是在getitem方法的實現(xiàn)方式上略有不同。回想一下，本課程中呈現(xiàn)的數據集類繼承自PyTorch的數據集類，因此，我們需要實現(xiàn)兩個函數:__getitem方法，它在給定索引時返回一個數據點;以及l(fā)en方法，該方法返回數據集的長度?！笆纠?餐廳評論的情緒分類”中的示例與本示例的區(qū)別在getitem__中，如示例4-5所示。它不像“示例:將餐館評論的情緒分類”那樣返回一個向量化的評論，而是返回一個向量化的姓氏和與其國籍相對應的索引:

Example 4-5. Implementing SurnameDataset.__getitem__()

from argparse import Namespace

from collections import Counter

import json

import os

import string

import numpy as np

import pandas as pd

import torch

import torch.nn as nn

import torch.nn.functional as F

import torch.optim as optim

from torch.utils.data import Dataset, DataLoader

from tqdm import tqdm_notebook

class Vocabulary(object):

"""Class to process text and extract vocabulary for mapping"""

def __init__(self, token_to_idx=None, add_unk=True, unk_token=""):

"""

Args:

token_to_idx (dict): a pre-existing map of tokens to indices

add_unk (bool): a flag that indicates whether to add the UNK token

unk_token (str): the UNK token to add into the Vocabulary

"""

if token_to_idx is None:

token_to_idx = {}

self._token_to_idx = token_to_idx # token到索引的映射字典

# 根據token_to_idx生成索引到token的映射字典

self._idx_to_token = {idx: token

for token, idx in self._token_to_idx.items()}

self._add_unk = add_unk # 是否添加UNK token的標志

self._unk_token = unk_token

self.unk_index = -1 # 初始化UNK索引為-1

if add_unk:

self.unk_index = self.add_token(unk_token) # 如果需要添加UNK token，則調用add_token方法添加，并更新unk_index

def to_serializable(self):

""" returns a dictionary that can be serialized """

return {'token_to_idx': self._token_to_idx,

'add_unk': self._add_unk,

'unk_token': self._unk_token}

@classmethod

def from_serializable(cls, contents):

""" instantiates the Vocabulary from a serialized dictionary """

return cls(**contents)

def add_token(self, token):

"""Update mapping dicts based on the token.

Args:

token (str): the item to add into the Vocabulary

Returns:

index (int): the integer corresponding to the token

"""

try:

index = self._token_to_idx[token] # 如果token已經在字典中，則直接返回其索引

except KeyError:

# 如果token不在字典中，則為其分配一個新的索引

index = len(self._token_to_idx)

self._token_to_idx[token] = index

self._idx_to_token[index] = token

return index

def add_many(self, tokens):

"""Add a list of tokens into the Vocabulary

Args:

tokens (list): a list of string tokens

Returns:

indices (list): a list of indices corresponding to the tokens

"""

return [self.add_token(token) for token in tokens]

def lookup_token(self, token):

"""Retrieve the index associated with the token

or the UNK index if token isn't present.

Args:

token (str): the token to look up

Returns:

index (int): the index corresponding to the token

Notes:

`unk_index` needs to be >=0 (having been added into the Vocabulary)

for the UNK functionality

"""

if self.unk_index >= 0:

return self._token_to_idx.get(token, self.unk_index)

else:

return self._token_to_idx[token]

def lookup_index(self, index):

"""Return the token associated with the index

Args:

index (int): the index to look up

Returns:

token (str): the token corresponding to the index

Raises:

KeyError: if the index is not in the Vocabulary

"""

if index not in self._idx_to_token:

raise KeyError("the index (%d) is not in the Vocabulary" % index)

return self._idx_to_token[index]

def __str__(self):

return "" % len(self)

def __len__(self):

return len(self._token_to_idx)

class SurnameDataset(Dataset):

def __init__(self, surname_df, vectorizer):

"""

Args:

surname_df (pandas.DataFrame): the dataset

vectorizer (SurnameVectorizer): vectorizer instatiated from dataset

"""

self.surname_df = surname_df # 存儲完整的姓氏數據集

self._vectorizer = vectorizer # 存儲姓氏向量化器

# 根據數據集的'split'列分割為訓練集、驗證集和測試集

self.train_df = self.surname_df[self.surname_df.split=='train']

self.train_size = len(self.train_df) # 訓練集大小

self.val_df = self.surname_df[self.surname_df.split=='val']

self.validation_size = len(self.val_df) # 驗證集大小

self.test_df = self.surname_df[self.surname_df.split=='test']

self.test_size = len(self.test_df) # 測試集大小

# 創(chuàng)建一個字典，用于快速查找不同分割的數據集和大小

self._lookup_dict = {'train': (self.train_df, self.train_size),

'val': (self.val_df, self.validation_size),

'test': (self.test_df, self.test_size)}

self.set_split('train') # 默認設置數據集的分割為訓練集

# Class weights

class_counts = surname_df.nationality.value_counts().to_dict() # 獲取國籍的計數，并轉換為字典

def sort_key(item):

return self._vectorizer.nationality_vocab.lookup_token(item[0])

sorted_counts = sorted(class_counts.items(), key=sort_key) # 對國籍計數進行排序

frequencies = [count for _, count in sorted_counts] # 提取排序后的計數列表

self.class_weights = 1.0 / torch.tensor(frequencies, dtype=torch.float32) # 計算類別權重（頻率的倒數）

@classmethod

def load_dataset_and_make_vectorizer(cls, surname_csv):

"""Load dataset and make a new vectorizer from scratch

Args:

surname_csv (str): location of the dataset

Returns:

an instance of SurnameDataset

"""

surname_df = pd.read_csv(surname_csv)

train_surname_df = surname_df[surname_df.split=='train'] # 從訓練集中創(chuàng)建向量化器

return cls(surname_df, SurnameVectorizer.from_dataframe(train_surname_df))

@classmethod

def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):

"""Load dataset and the corresponding vectorizer.

Used in the case in the vectorizer has been cached for re-use

Args:

surname_csv (str): location of the dataset

vectorizer_filepath (str): location of the saved vectorizer

Returns:

an instance of SurnameDataset

"""

surname_df = pd.read_csv(surname_csv)

vectorizer = cls.load_vectorizer_only(vectorizer_filepath)

return cls(surname_df, vectorizer)

@staticmethod

def load_vectorizer_only(vectorizer_filepath):

"""a static method for loading the vectorizer from file

Args:

vectorizer_filepath (str): the location of the serialized vectorizer

Returns:

an instance of SurnameVectorizer

"""

with open(vectorizer_filepath) as fp:

return SurnameVectorizer.from_serializable(json.load(fp))

def save_vectorizer(self, vectorizer_filepath):

"""saves the vectorizer to disk using json

Args:

vectorizer_filepath (str): the location to save the vectorizer

"""

with open(vectorizer_filepath, "w") as fp:

json.dump(self._vectorizer.to_serializable(), fp)

def get_vectorizer(self):

""" returns the vectorizer """

return self._vectorizer

def set_split(self, split="train"):

""" selects the splits in the dataset using a column in the dataframe """

self._target_split = split

self._target_df, self._target_size = self._lookup_dict[split]

def __len__(self):

return self._target_size

def __getitem__(self, index):

"""the primary entry point method for PyTorch datasets

Args:

index (int): the index to the data point

Returns:

a dictionary holding the data point's:

features (x_surname)

label (y_nationality)

"""

row = self._target_df.iloc[index]

surname_vector = \

self._vectorizer.vectorize(row.surname)

nationality_index = \

self._vectorizer.nationality_vocab.lookup_token(row.nationality)

return {'x_surname': surname_vector,

'y_nationality': nationality_index}

def get_num_batches(self, batch_size):

"""Given a batch size, return the number of batches in the dataset

Args:

batch_size (int)

Returns:

number of batches in the dataset

"""

return len(self) // batch_size

def generate_batches(dataset, batch_size, shuffle=True,

drop_last=True, device="cpu"):

"""

A generator function which wraps the PyTorch DataLoader. It will

ensure each tensor is on the write device location.

"""

dataloader = DataLoader(dataset=dataset, batch_size=batch_size,

shuffle=shuffle, drop_last=drop_last)

for data_dict in dataloader:

out_data_dict = {} # 創(chuàng)建一個新的字典來存儲處理后的數據

# 遍歷數據字典中的每個條目

for name, tensor in data_dict.items():

out_data_dict[name] = data_dict[name].to(device) # 將張量移動到指定的設備上

yield out_data_dict

3.1.2 Vocabulary, Vectorizer, and DataLoader

為了使用字符對姓氏進行分類，我們使用詞匯表、向量化器和DataLoader將姓氏字符串轉換為向量化的minibatches。這些數據結構與“Example: Classifying Sentiment of Restaurant Reviews”中使用的數據結構相同，它們舉例說明了一種多態(tài)性，這種多態(tài)性將姓氏的字符標記與Yelp評論的單詞標記相同對待。數據不是通過將字令牌映射到整數來向量化的，而是通過將字符映射到整數來向量化的。

THE VOCABULARY CLASS

本例中使用的詞匯類與“example: Classifying Sentiment of Restaurant Reviews”中的詞匯完全相同，該詞匯類將Yelp評論中的單詞映射到對應的整數。簡要概述一下，詞匯表是兩個Python字典的協(xié)調，這兩個字典在令牌(在本例中是字符)和整數之間形成一個雙射;也就是說，第一個字典將字符映射到整數索引，第二個字典將整數索引映射到字符。add_token方法用于向詞匯表中添加新的令牌，lookup_token方法用于檢索索引，lookup_index方法用于檢索給定索引的令牌(在推斷階段很有用)。與Yelp評論的詞匯表不同，我們使用的是one-hot詞匯表，不計算字符出現(xiàn)的頻率，只對頻繁出現(xiàn)的條目進行限制。這主要是因為數據集很小，而且大多數字符足夠頻繁。

THE SURNAMEVECTORIZER

雖然詞匯表將單個令牌(字符)轉換為整數，但SurnameVectorizer負責應用詞匯表并將姓氏轉換為向量。實例化和使用非常類似于“示例:對餐館評論的情緒進行分類”中的ReviewVectorizer，但有一個關鍵區(qū)別:字符串沒有在空格上分割。姓氏是字符的序列，每個字符在我們的詞匯表中是一個單獨的標記。然而，在“卷積神經網絡”出現(xiàn)之前，我們將忽略序列信息，通過迭代字符串輸入中的每個字符來創(chuàng)建輸入的收縮one-hot向量表示。我們?yōu)橐郧拔从龅降淖址付ㄒ粋€特殊的令牌，即UNK。由于我們僅從訓練數據實例化詞匯表，而且驗證或測試數據中可能有惟一的字符，所以在字符詞匯表中仍然使用UNK符號。

雖然我們在這個示例中使用了收縮的one-hot，但是在后面的實驗中，將了解其他向量化方法，它們是one-hot編碼的替代方法，有時甚至更好。具體來說，在“示例:使用CNN對姓氏進行分類”中，將看到一個熱門矩陣，其中每個字符都是矩陣中的一個位置，并具有自己的熱門向量。然后，在實驗5中，將學習嵌入層，返回整數向量的向量化，以及如何使用它們創(chuàng)建密集向量矩陣?？匆幌率纠?-6中SurnameVectorizer的代碼。

Example 4-6. Implementing SurnameVectorizer

class SurnameVectorizer(object):

""" The Vectorizer which coordinates the Vocabularies and puts them to use"""

def __init__(self, surname_vocab, nationality_vocab):

"""

Args:

surname_vocab (Vocabulary): maps characters to integers

nationality_vocab (Vocabulary): maps nationalities to integers

"""

self.surname_vocab = surname_vocab

self.nationality_vocab = nationality_vocab

def vectorize(self, surname):

"""

Args:

surname (str): the surname

Returns:

one_hot (np.ndarray): a collapsed one-hot encoding

"""

vocab = self.surname_vocab

one_hot = np.zeros(len(vocab), dtype=np.float32) # 初始化一個全零數組，長度等于詞匯表大小

for token in surname:

one_hot[vocab.lookup_token(token)] = 1 # 如果token在詞匯表中，則將其對應的索引位置設置為1

return one_hot

@classmethod

def from_dataframe(cls, surname_df):

"""Instantiate the vectorizer from the dataset dataframe

Args:

surname_df (pandas.DataFrame): the surnames dataset

Returns:

an instance of the SurnameVectorizer

"""

surname_vocab = Vocabulary(unk_token="@") # 創(chuàng)建一個包含未知標記的詞匯表

nationality_vocab = Vocabulary(add_unk=False) # 創(chuàng)建一個不包含未知標記的國籍詞匯表

for index, row in surname_df.iterrows():

for letter in row.surname: # 遍歷數據幀中每一行的姓氏字符

surname_vocab.add_token(letter) # 將字符添加到姓氏詞匯表

nationality_vocab.add_token(row.nationality) # 將國籍添加到國籍詞匯表

return cls(surname_vocab, nationality_vocab) # 使用詞匯表實例化SurnameVectorizer

@classmethod

def from_serializable(cls, contents):

surname_vocab = Vocabulary.from_serializable(contents['surname_vocab']) # 從序列化內容恢復姓氏詞匯表

nationality_vocab = Vocabulary.from_serializable(contents['nationality_vocab']) # 從序列化內容恢復國籍詞匯表

return cls(surname_vocab=surname_vocab, nationality_vocab=nationality_vocab) # 使用恢復的詞匯表實例化SurnameVectorizer

def to_serializable(self):

return {'surname_vocab': self.surname_vocab.to_serializable(),

'nationality_vocab': self.nationality_vocab.to_serializable()}

3.1.3 The Surname Classifier Model

SurnameClassifier是本實驗前面介紹的MLP的實現(xiàn)(示例4-7)。第一個線性層將輸入向量映射到中間向量，并對該向量應用非線性。第二線性層將中間向量映射到預測向量。

在最后一步中，可選地應用softmax操作，以確保輸出和為1;這就是所謂的“概率”。它是可選的原因與我們使用的損失函數的數學公式有關——交叉熵損失。我們研究了“損失函數”中的交叉熵損失?；叵胍幌?，交叉熵損失對于多類分類是最理想的，但是在訓練過程中軟最大值的計算不僅浪費而且在很多情況下并不穩(wěn)定。

Example 4-7. The SurnameClassifier as an MLP

class SurnameClassifier(nn.Module):

""" A 2-layer Multilayer Perceptron for classifying surnames """

def __init__(self, input_dim, hidden_dim, output_dim):

"""

Args:

input_dim (int): the size of the input vectors

hidden_dim (int): the output size of the first Linear layer

output_dim (int): the output size of the second Linear layer

"""

super(SurnameClassifier, self).__init__() # 調用父類的構造函數

self.fc1 = nn.Linear(input_dim, hidden_dim) # 定義第一個線性層，將輸入維度映射到隱藏維度

self.fc2 = nn.Linear(hidden_dim, output_dim) # 定義第二個線性層，將隱藏維度映射到輸出維度

def forward(self, x_in, apply_softmax=False):

"""The forward pass of the classifier

Args:

x_in (torch.Tensor): an input data tensor.

x_in.shape should be (batch, input_dim)

apply_softmax (bool): a flag for the softmax activation

should be false if used with the Cross Entropy losses

Returns:

the resulting tensor. tensor.shape should be (batch, output_dim)

"""

intermediate_vector = F.relu(self.fc1(x_in)) # 通過第一個線性層并應用ReLU激活函數

prediction_vector = self.fc2(intermediate_vector) # 通過第二個線性層得到預測向量

# 如果需要，應用softmax激活函數

if apply_softmax:

prediction_vector = F.softmax(prediction_vector, dim=1)

return prediction_vector

3.1.4 The Training Routine

雖然我們使用了不同的模型、數據集和損失函數，但是訓練例程是相同的。因此，在例4-8中，我們只展示了args以及本例中的訓練例程與“示例:餐廳評論情緒分類”中的示例之間的主要區(qū)別。

Example 4-8. The args for classifying surnames with an MLP

def make_train_state(args):

return {'stop_early': False, # 是否應提前停止訓練

'early_stopping_step': 0, # 提前停止的步數計數器

'early_stopping_best_val': 1e8, # 迄今為止驗證集上的最佳損失值，初始化為一個較大的數

'learning_rate': args.learning_rate, # 學習率

'epoch_index': 0, # 當前訓練的輪次索引

'train_loss': [], # 訓練過程中的損失值列表

'train_acc': [], # 訓練過程中的準確率列表

'val_loss': [], # 驗證過程中的損失值列表

'val_acc': [], # 驗證過程中的準確率列表

'test_loss': -1, # 測試過程中的損失值（尚未計算時為-1）

'test_acc': -1, # 測試過程中的準確率（尚未計算時為-1）

'model_filename': args.model_state_file}

def update_train_state(args, model, train_state):

"""Handle the training state updates.

Components:

- Early Stopping: Prevent overfitting.

- Model Checkpoint: Model is saved if the model is better

:param args: main arguments

:param model: model to train

:param train_state: a dictionary representing the training state values

:returns:

a new train_state

"""

# Save one model at least

if train_state['epoch_index'] == 0:

torch.save(model.state_dict(), train_state['model_filename'])

train_state['stop_early'] = False

# Save model if performance improved

elif train_state['epoch_index'] >= 1:

loss_tm1, loss_t = train_state['val_loss'][-2:] # 獲取最近兩次的驗證損失值

# If loss worsened

if loss_t >= train_state['early_stopping_best_val']:

# Update step

train_state['early_stopping_step'] += 1 # 更新提前停止的步數

# Loss decreased

else:

# Save the best model

if loss_t < train_state['early_stopping_best_val']:

torch.save(model.state_dict(), train_state['model_filename']) # 保存最佳模型狀態(tài)

# Reset early stopping step

train_state['early_stopping_step'] = 0

# Stop early ?

train_state['stop_early'] = \

train_state['early_stopping_step'] >= args.early_stopping_criteria # 判斷是否滿足提前停止的條件

return train_state

def compute_accuracy(y_pred, y_target):

_, y_pred_indices = y_pred.max(dim=1)

n_correct = torch.eq(y_pred_indices, y_target).sum().item()

return n_correct / len(y_pred_indices) * 100

def set_seed_everywhere(seed, cuda):

np.random.seed(seed) # 設置NumPy的隨機種子

torch.manual_seed(seed) # 設置PyTorch的CPU隨機種子

if cuda:

torch.cuda.manual_seed_all(seed) # 如果cuda為True，設置PyTorch的CUDA隨機種子

# 處理目錄，如果不存在則創(chuàng)建

def handle_dirs(dirpath):

if not os.path.exists(dirpath):

os.makedirs(dirpath) # 如果目錄不存在，則創(chuàng)建它

args = Namespace(

# Data and path information

surname_csv="data/surnames/surnames_with_splits.csv",

vectorizer_file="vectorizer.json",

model_state_file="model.pth",

save_dir="model_storage/ch4/surname_mlp",

# Model hyper parameters

hidden_dim=300,

# Training hyper parameters

seed=1337,

num_epochs=100,

early_stopping_criteria=5,

learning_rate=0.001,

batch_size=64,

# Runtime options

cuda=False,

reload_from_files=False,

expand_filepaths_to_save_dir=True,

)

if args.expand_filepaths_to_save_dir:

args.vectorizer_file = os.path.join(args.save_dir,

args.vectorizer_file) # 將向量器文件的路徑與保存目錄合并，確保文件保存在指定的保存目錄中

args.model_state_file = os.path.join(args.save_dir,

args.model_state_file) # 將模型狀態(tài)文件的路徑與保存目錄合并，確保文件保存在指定的保存目錄中

print("Expanded filepaths: ")

print("\t{}".format(args.vectorizer_file))

print("\t{}".format(args.model_state_file))

# Check CUDA

if not torch.cuda.is_available():

args.cuda = False

args.device = torch.device("cuda" if args.cuda else "cpu")

print("Using CUDA: {}".format(args.cuda))

# Set seed for reproducibility

set_seed_everywhere(args.seed, args.cuda)

# handle dirs

handle_dirs(args.save_dir)

Expanded filepaths:

model_storage/ch4/surname_mlp/vectorizer.json

model_storage/ch4/surname_mlp/model.pth

Using CUDA: False

訓練中最顯著的差異與模型中輸出的種類和使用的損失函數有關。在這個例子中，輸出是一個多類預測向量，可以轉換為概率。正如在模型描述中所描述的，這種輸出的損失類型僅限于CrossEntropyLoss和NLLLoss。由于它的簡化，我們使用了CrossEntropyLoss。

在例4-9中，我們展示了數據集、模型、損失函數和優(yōu)化器的實例化。這些實例應該看起來與“示例:將餐館評論的情緒分類”中的實例幾乎相同。事實上，在本課程后面的實驗中，這種模式將對每個示例進行重復。

Example 4-9. Instantiating the dataset, model, loss, and optimizer

if args.reload_from_files:

# training from a checkpoint

print("Reloading!")

dataset = SurnameDataset.load_dataset_and_load_vectorizer(args.surname_csv,

args.vectorizer_file) # 加載數據集和已經存在的向量化器

else:

# create dataset and vectorizer

print("Creating fresh!")

dataset = SurnameDataset.load_dataset_and_make_vectorizer(args.surname_csv) # 加載數據集并創(chuàng)建新的向量化器

dataset.save_vectorizer(args.vectorizer_file) # 保存新創(chuàng)建的向量化器到文件

vectorizer = dataset.get_vectorizer() # 從數據集中獲取向量化器

classifier = SurnameClassifier(input_dim=len(vectorizer.surname_vocab),

hidden_dim=args.hidden_dim,

output_dim=len(vectorizer.nationality_vocab))

Creating fresh!

THE TRAINING LOOP

與“Example: Classifying Sentiment of Restaurant Reviews”中的訓練循環(huán)相比，本例的訓練循環(huán)除了變量名以外幾乎是相同的。具體來說，示例4-10顯示了使用不同的key從batch_dict中獲取數據。除了外觀上的差異，訓練循環(huán)的功能保持不變。利用訓練數據，計算模型輸出、損失和梯度。然后，使用梯度來更新模型。

Example 4-10. A snippet of the training loop

classifier = classifier.to(args.device) # 將分類器的參數和模型移動到指定的設備

dataset.class_weights = dataset.class_weights.to(args.device) # 如果數據集有類權重（class_weights），也將其移動到指定的設備

loss_func = nn.CrossEntropyLoss(dataset.class_weights) # 創(chuàng)建一個帶有類權重的交叉熵損失函數

optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate) # 使用Adam優(yōu)化算法初始化優(yōu)化器

# 創(chuàng)建一個學習率調度器，用于在驗證損失停止下降時降低學習率

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,

mode='min', factor=0.5,

patience=1)

train_state = make_train_state(args) # 創(chuàng)建一個訓練狀態(tài)字典，用于跟蹤訓練過程中的各種指標

epoch_bar = tqdm_notebook(desc='training routine',

total=args.num_epochs,

position=0)

# 設置數據集為訓練模式，并獲取一個進度條來跟蹤訓練數據集的批處理

dataset.set_split('train')

train_bar = tqdm_notebook(desc='split=train',

total=dataset.get_num_batches(args.batch_size),

position=1,

leave=True)

# 設置數據集為驗證模式，并獲取一個進度條來跟蹤驗證數據集的批處理

dataset.set_split('val')

val_bar = tqdm_notebook(desc='split=val',

total=dataset.get_num_batches(args.batch_size),

position=1,

leave=True)

try:

for epoch_index in range(args.num_epochs):

train_state['epoch_index'] = epoch_index

# Iterate over training dataset

# setup: batch generator, set loss and acc to 0, set train mode on

dataset.set_split('train')

batch_generator = generate_batches(dataset,

batch_size=args.batch_size,

device=args.device)

running_loss = 0.0

running_acc = 0.0

classifier.train()

for batch_index, batch_dict in enumerate(batch_generator):

# the training routine is these 5 steps:

# --------------------------------------

# step 1. zero the gradients

optimizer.zero_grad()

# step 2. compute the output

y_pred = classifier(batch_dict['x_surname'])

# step 3. compute the loss

loss = loss_func(y_pred, batch_dict['y_nationality'])

loss_t = loss.item()

running_loss += (loss_t - running_loss) / (batch_index + 1)

# step 4. use loss to produce gradients

loss.backward()

# step 5. use optimizer to take gradient step

optimizer.step()

# -----------------------------------------

# compute the accuracy

acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])

running_acc += (acc_t - running_acc) / (batch_index + 1)

# update bar

train_bar.set_postfix(loss=running_loss, acc=running_acc,

epoch=epoch_index)

train_bar.update()

train_state['train_loss'].append(running_loss)

train_state['train_acc'].append(running_acc)

# Iterate over val dataset

# setup: batch generator, set loss and acc to 0; set eval mode on

dataset.set_split('val')

batch_generator = generate_batches(dataset,

batch_size=args.batch_size,

device=args.device)

running_loss = 0.

running_acc = 0.

classifier.eval()

for batch_index, batch_dict in enumerate(batch_generator):

# compute the output

y_pred = classifier(batch_dict['x_surname'])

# step 3. compute the loss

loss = loss_func(y_pred, batch_dict['y_nationality'])

loss_t = loss.to("cpu").item()

running_loss += (loss_t - running_loss) / (batch_index + 1)

# compute the accuracy

acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])

running_acc += (acc_t - running_acc) / (batch_index + 1)

val_bar.set_postfix(loss=running_loss, acc=running_acc,

epoch=epoch_index)

val_bar.update()

train_state['val_loss'].append(running_loss)

train_state['val_acc'].append(running_acc)

train_state = update_train_state(args=args, model=classifier,

train_state=train_state)

scheduler.step(train_state['val_loss'][-1])

if train_state['stop_early']:

break

train_bar.n = 0

val_bar.n = 0

epoch_bar.update()

except KeyboardInterrupt:

print("Exiting loop")

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:15: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`

from ipykernel import kernelapp as app

HBox(children=(FloatProgress(value=0.0, description='training routine', style=ProgressStyle(description_width=…

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:21: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`

HBox(children=(FloatProgress(value=0.0, description='split=train', max=120.0, style=ProgressStyle(description_…

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:26: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`

HBox(children=(FloatProgress(value=0.0, description='split=val', max=25.0, style=ProgressStyle(description_wid…

# compute the loss & accuracy on the test set using the best available model

classifier.load_state_dict(torch.load(train_state['model_filename']))

classifier = classifier.to(args.device)

dataset.class_weights = dataset.class_weights.to(args.device)

loss_func = nn.CrossEntropyLoss(dataset.class_weights)

dataset.set_split('test') # 設置數據集為測試模式

# 生成一個批處理生成器，用于迭代測試數據集

batch_generator = generate_batches(dataset,

batch_size=args.batch_size,

device=args.device)

# 初始化運行損失和運行準確率變量

running_loss = 0.

running_acc = 0.

classifier.eval()

# 遍歷測試數據集的批處理

for batch_index, batch_dict in enumerate(batch_generator):

# compute the output

y_pred = classifier(batch_dict['x_surname'])

# compute the loss

loss = loss_func(y_pred, batch_dict['y_nationality'])

loss_t = loss.item()

running_loss += (loss_t - running_loss) / (batch_index + 1)

# compute the accuracy

acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])

running_acc += (acc_t - running_acc) / (batch_index + 1)

# 將最終的平均損失和準確率保存到訓練狀態(tài)中

train_state['test_loss'] = running_loss

train_state['test_acc'] = running_acc

print("Test loss: {};".format(train_state['test_loss']))

print("Test Accuracy: {}".format(train_state['test_acc']))

Test loss: 1.819154896736145;

Test Accuracy: 46.68749999999999

3.1.5 Model Evaluation and Prediction

要理解模型的性能，應該使用定量和定性方法分析模型。定量測量出的測試數據的誤差，決定了分類器能否推廣到不可見的例子。定性地說，可以通過查看分類器的top-k預測來為一個新示例開發(fā)模型所了解的內容的直覺。

3.1.5.1 EVALUATING ON THE TEST DATASET

評價SurnameClassifier測試數據,我們執(zhí)行相同的常規(guī)的routine文本分類的例子“餐館評論的例子:分類情緒”:我們將數據集設置為遍歷測試數據,調用classifier.eval()方法,并遍歷測試數據以同樣的方式與其他數據。在這個例子中，調用classifier.eval()可以防止PyTorch在使用測試/評估數據時更新模型參數。

該模型對測試數據的準確性達到50%左右。如果在附帶的notebook中運行訓練例程，會注意到在訓練數據上的性能更高。這是因為模型總是更適合它所訓練的數據，所以訓練數據的性能并不代表新數據的性能。如果遵循代碼，你可以嘗試隱藏維度的不同大小，應該注意到性能的提高。然而，這種增長不會很大(尤其是與“用CNN對姓氏進行分類的例子”中的模型相比)。其主要原因是收縮的onehot向量化方法是一種弱表示。雖然它確實簡潔地將每個姓氏表示為單個向量，但它丟棄了字符之間的順序信息，這對于識別起源非常重要。

3.1.5.2 CLASSIFYING A NEW SURNAME

示例4-11顯示了分類新姓氏的代碼。給定一個姓氏作為字符串，該函數將首先應用向量化過程，然后獲得模型預測。注意，我們包含了apply_softmax標志，所以結果包含概率。模型預測，在多項式的情況下，是類概率的列表。我們使用PyTorch張量最大函數來得到由最高預測概率表示的最優(yōu)類。

Example 4-11. A function for performing nationality prediction

def predict_nationality(surname, classifier, vectorizer):

"""Predict the nationality from a new surname

Args:

surname (str): the surname to classifier

classifier (SurnameClassifer): an instance of the classifier

vectorizer (SurnameVectorizer): the corresponding vectorizer

Returns:

a dictionary with the most likely nationality and its probability

"""

vectorized_surname = vectorizer.vectorize(surname) # 使用向量轉換器將姓氏轉換為向量

vectorized_surname = torch.tensor(vectorized_surname).view(1, -1) # 將得到的向量轉換為PyTorch張量，并調整其形狀以適應模型的輸入要求

result = classifier(vectorized_surname, apply_softmax=True) # 調用分類器進行預測，并應用softmax函數獲取概率分布

probability_values, indices = result.max(dim=1) # 找到概率值最大的索引

index = indices.item() # 獲取最大概率值的索引

predicted_nationality = vectorizer.nationality_vocab.lookup_index(index) # 使用向量轉換器的詞匯表查找該索引對應的國籍

probability_value = probability_values.item() # 獲取該國籍的概率值

return {'nationality': predicted_nationality, 'probability': probability_value}

new_surname = input("Enter a surname to classify: ")

classifier = classifier.to("cpu")

prediction = predict_nationality(new_surname, classifier, vectorizer)

print("{} -> {} (p={:0.2f})".format(new_surname,

prediction['nationality'],

prediction['probability']))

Enter a surname to classify: McMahan

McMahan -> Irish (p=0.41)

3.1.5.3 RETRIEVING THE TOP-K PREDICTIONS FOR A NEW SURNAME

不僅要看最好的預測，還要看更多的預測。例如，NLP中的標準實踐是采用k-best預測并使用另一個模型對它們重新排序。PyTorch提供了一個torch.topk函數，它提供了一種方便的方法來獲得這些預測，如示例4-12所示。

Example 4-12. Predicting the top-k nationalities

vectorizer.nationality_vocab.lookup_index(8)

'Irish'

def predict_topk_nationality(name, classifier, vectorizer, k=5):

vectorized_name = vectorizer.vectorize(name)

vectorized_name = torch.tensor(vectorized_name).view(1, -1)

prediction_vector = classifier(vectorized_name, apply_softmax=True)

probability_values, indices = torch.topk(prediction_vector, k=k)

# returned size is 1,k

probability_values = probability_values.detach().numpy()[0]

indices = indices.detach().numpy()[0]

results = []

for prob_value, index in zip(probability_values, indices):

nationality = vectorizer.nationality_vocab.lookup_index(index)

results.append({'nationality': nationality,

'probability': prob_value})

return results

new_surname = input("Enter a surname to classify: ")

classifier = classifier.to("cpu")

k = int(input("How many of the top predictions to see? "))

if k > len(vectorizer.nationality_vocab):

print("Sorry! That's more than the # of nationalities we have.. defaulting you to max size :)")

k = len(vectorizer.nationality_vocab)

predictions = predict_topk_nationality(new_surname, classifier, vectorizer, k=k)

print("Top {} predictions:".format(k))

print("===================")

for prediction in predictions:

print("{} -> {} (p={:0.2f})".format(new_surname,

prediction['nationality'],

prediction['probability']))

Enter a surname to classify: McMahan

How many of the top predictions to see? 5

Top 5 predictions:

===================

McMahan -> Irish (p=0.41)

McMahan -> Scottish (p=0.25)

McMahan -> Czech (p=0.08)

McMahan -> Vietnamese (p=0.06)

McMahan -> German (p=0.05)

3.1.6 Regularizing MLPs: Weight Regularization and Structural Regularization (or Dropout)

在實驗3中，我們解釋了正則化是如何解決過擬合問題的，并研究了兩種重要的權重正則化類型——L1和L2。這些權值正則化方法也適用于MLPs和卷積神經網絡，我們將在本實驗后面介紹。除權值正則化外，對于深度模型(即例如本實驗討論的前饋網絡，一種稱為dropout的結構正則化方法變得非常重要。

DROPOUT

簡單地說，在訓練過程中，dropout有一定概率使屬于兩個相鄰層的單元之間的連接減弱。這有什么用呢?我們從斯蒂芬?梅里蒂(Stephen Merity)的一段直觀(且幽默)的解釋開始：“Dropout，簡單地說，是指如果你能在喝醉的時候反復學習如何做一件事，那么你應該能夠在清醒的時候做得更好。這一見解產生了許多最先進的結果和一個新興的領域?！?/p>

神經網絡——尤其是具有大量分層的深層網絡——可以在單元之間創(chuàng)建有趣的相互適應?！癈oadaptation”是神經科學中的一個術語，但在這里它只是指一種情況，即兩個單元之間的聯(lián)系變得過于緊密，而犧牲了其他單元之間的聯(lián)系。這通常會導致模型與數據過擬合。通過概率地丟棄單元之間的連接，我們可以確保沒有一個單元總是依賴于另一個單元，從而產生健壯的模型。dropout不會向模型中添加額外的參數，但是需要一個超參數——“drop probability”。drop probability，它是單位之間的連接drop的概率。通常將下降概率設置為0.5。例4-13給出了一個帶dropout的MLP的重新實現(xiàn)。

Example 4-13. MLP with dropout

class MultilayerPerceptron(nn.Module):

def __init__(self, input_dim, hidden_dim, output_dim):

"""

Args:

input_dim (int): the size of the input vectors

hidden_dim (int): the output size of the first Linear layer

output_dim (int): the output size of the second Linear layer

"""

super(MultilayerPerceptron, self).__init__()

self.fc1 = nn.Linear(input_dim, hidden_dim) # 第一個全連接層，將輸入層連接到隱藏層

self.fc2 = nn.Linear(hidden_dim, output_dim) # 第二個全連接層，將隱藏層連接到輸出層

def forward(self, x_in, apply_softmax=False):

"""The forward pass of the MLP

Args:

x_in (torch.Tensor): an input data tensor.

x_in.shape should be (batch, input_dim)

apply_softmax (bool): a flag for the softmax activation

should be false if used with the Cross Entropy losses

Returns:

the resulting tensor. tensor.shape should be (batch, output_dim)

"""

intermediate = F.relu(self.fc1(x_in)) # 第一個全連接層，并通過 ReLU 激活函數

output = self.fc2(F.dropout(intermediate, p=0.5)) # 第二個全連接層之前，對中間結果進行 dropout 操作

if apply_softmax:

output = F.softmax(output, dim=1)

return output

batch_size = 2 # number of samples input at once

input_dim = 3

hidden_dim = 100

output_dim = 4

# Initialize model

mlp = MultilayerPerceptron(input_dim, hidden_dim, output_dim)

print(mlp)

y_output = mlp(x_input, apply_softmax=False)

describe(y_output)

MultilayerPerceptron(

(fc1): Linear(in_features=3, out_features=100, bias=True)

(fc2): Linear(in_features=100, out_features=4, bias=True)

)

Type: torch.FloatTensor

Shape/size: torch.Size([2, 4])

Values:

tensor([[ 0.0442, -0.0526, 0.0178, 0.5090],

[ 0.1424, 0.0911, -0.0014, -0.1472]], grad_fn=)

請注意，dropout只適用于訓練期間，不適用于評估期間。作為練習，可以嘗試帶有dropout的SurnameClassifier模型，看看它如何更改結果。

3.2 Convolutional Neural Networks

在本實驗的第一部分中，我們深入研究了MLPs、由一系列線性層和非線性函數構建的神經網絡。mlp不是利用順序模式的最佳工具。例如，在姓氏數據集中，姓氏可以有(不同長度的)段，這些段可以顯示出相當多關于其起源國家的信息(如“O’Neill”中的“O”、“Antonopoulos”中的“opoulos”、“Nagasawa”中的“sawa”或“Zhu”中的“Zh”)。這些段的長度可以是可變的，挑戰(zhàn)是在不顯式編碼的情況下捕獲它們。

在本節(jié)中，我們將介紹卷積神經網絡(CNN)，這是一種非常適合檢測空間子結構(并因此創(chuàng)建有意義的空間子結構)的神經網絡。CNNs通過使用少量的權重來掃描輸入數據張量來實現(xiàn)這一點。通過這種掃描，它們產生表示子結構檢測(或不檢測)的輸出張量。

在本節(jié)的其余部分中，我們首先描述CNN的工作方式，以及在設計CNN時應該考慮的問題。我們深入研究CNN超參數，目的是提供直觀的行為和這些超參數對輸出的影響。最后，我們通過幾個簡單的例子逐步說明CNNs的機制。在“示例:使用CNN對姓氏進行分類”中，我們將深入研究一個更廣泛的示例。

HISTORICAL CONTEXT

CNNs的名稱和基本功能源于經典的數學運算卷積。卷積已經應用于各種工程學科，包括數字信號處理和計算機圖形學。一般來說，卷積使用程序員指定的參數。這些參數被指定來匹配一些功能設計，如突出邊緣或抑制高頻聲音。事實上，許多Photoshop濾鏡都是應用于圖像的固定卷積運算。然而，在深度學習和本實驗中，我們從數據中學習卷積濾波器的參數，因此它對于解決當前的任務是最優(yōu)的。

CNN Hyperparameters

為了理解不同的設計決策對CNN意味著什么，我們在圖4-6中展示了一個示例。在本例中，單個“核”應用于輸入矩陣。卷積運算(線性算子)的精確數學表達式對于理解這一節(jié)并不重要，但是從這個圖中可以直觀地看出，核是一個小的方陣，它被系統(tǒng)地應用于輸入矩陣的不同位置。

圖4-6 二維卷積運算。

輸入矩陣與單個產生輸出矩陣的卷積核（也稱為特征映射）在輸入矩陣的每個位置應用內核。在每個應用程序中，內核乘以輸入矩陣的值及其自身的值，然后將這些乘法相加kernel具有以下超參數配置：kernel_size=2，stride=1，padding=0，以及dilation=1。這些超參數解釋如下:

雖然經典卷積是通過指定核的具體值來設計的，但是CNN是通過指定控制CNN行為的超參數來設計的，然后使用梯度下降來為給定數據集找到最佳參數。兩個主要的超參數控制卷積的形狀(稱為kernel_size)和卷積將在輸入數據張量(稱為stride)中相乘的位置。還有一些額外的超參數控制輸入數據張量被0填充了多少(稱為padding)，以及當應用到輸入數據張量(稱為dilation)時，乘法應該相隔多遠。在下面的小節(jié)中，我們將更詳細地介紹這些超參數。

DIMENSION OF THE CONVOLUTION OPERATION

首先要理解的概念是卷積運算的維數。在圖4-6和本節(jié)的其他圖中，我們使用二維卷積進行說明，但是根據數據的性質，還有更適合的其他維度的卷積。在PyTorch中，卷積可以是一維、二維或三維的，分別由Conv1d、Conv2d和Conv3d模塊實現(xiàn)。一維卷積對于每個時間步都有一個特征向量的時間序列非常有用。在這種情況下，我們可以在序列維度上學習模式。NLP中的卷積運算大多是一維的卷積。另一方面，二維卷積試圖捕捉數據中沿兩個方向的時空模式;例如，在圖像中沿高度和寬度維度——為什么二維卷積在圖像處理中很流行。類似地，在三維卷積中，模式是沿著數據中的三維捕獲的。例如，在視頻數據中，信息是三維的，二維表示圖像的幀，時間維表示幀的序列。就本課程而言，我們主要使用Conv1d。

CHANNELS

非正式地，通道(channel)是指沿輸入中的每個點的特征維度。例如，在圖像中，對應于RGB組件的圖像中的每個像素有三個通道。在使用卷積時，文本數據也可以采用類似的概念。從概念上講，如果文本文檔中的“像素”是單詞，那么通道的數量就是詞匯表的大小。如果我們更細粒度地考慮字符的卷積，通道的數量就是字符集的大小(在本例中剛好是詞匯表)。在PyTorch卷積實現(xiàn)中，輸入通道的數量是in_channels參數。卷積操作可以在輸出(out_channels)中產生多個通道。您可以將其視為卷積運算符將輸入特征維“映射”到輸出特征維。圖4-7和圖4-8說明了這個概念。

圖4-7 卷積運算用兩個輸入矩陣（兩個輸入通道）表示相應的核也有兩層，它將每層分別相乘，然后對結果求和。參數配置：input_channels=2, output_channels=1, kernel_size=2, tride=1, padding=0, and dilation=1.

圖4-8 一種具有一個輸入矩陣（一個輸入通道）和兩個卷積的卷積運算核（兩個輸出通道）。這些核分別應用于輸入矩陣，并堆疊在輸出張量。參數配置：input_channels=1, output_channels=2, kernel_size=2, tride=1, padding=0, and dilation=1.

很難立即知道有多少輸出通道適合當前的問題。為了簡化這個困難，我們假設邊界是1,1,024——我們可以有一個只有一個通道的卷積層，也可以有一個只有1,024個通道的卷積層?，F(xiàn)在我們有了邊界，接下來要考慮的是有多少個輸入通道。一種常見的設計模式是，從一個卷積層到下一個卷積層，通道數量的縮減不超過2倍。這不是一個硬性的規(guī)則，但是它應該讓您了解適當數量的out_channels是什么樣子的。

KERNEL SIZE

核矩陣的寬度稱為核大小(PyTorch中的kernel_size)。在圖4-6中，核大小為2，而在圖4-9中，我們顯示了一個大小為3的內核。卷積將輸入中的空間(或時間)本地信息組合在一起，每個卷積的本地信息量由內核大小控制。然而，通過增加核的大小，也會減少輸出的大小(Dumoulin和Visin, 2016)。這就是為什么當核大小為3時，輸出矩陣是圖4-9中的2x2，而當核大小為2時，輸出矩陣是圖4-6中的3x3。

圖4-9 將kernel_size=3的卷積應用于輸入矩陣。結果是一個折衷的結果：在每次將內核應用于矩陣時，都會使用更多的局部信息，但輸出的大小會更小.

此外，可以將NLP應用程序中核大小的行為看作類似于通過查看單詞組捕獲語言模式的n-gram的行為。使用較小的核大小，可以捕獲較小的頻繁模式，而較大的核大小會導致較大的模式，這可能更有意義，但是發(fā)生的頻率更低。較小的核大小會導致輸出中的細粒度特性，而較大的核大小會導致粗粒度特性。

STRIDE

Stride控制卷積之間的步長。如果步長與核相同，則內核計算不會重疊。另一方面，如果跨度為1，則內核重疊最大。輸出張量可以通過增加步幅的方式被有意的壓縮來總結信息，如圖4-10所示。

圖4-10 應用于具有超參數步長的輸入的kernel_size=2的卷積核等于2。這會導致內核采取更大的步驟，從而產生更小的輸出矩陣。對于更稀疏地對輸入矩陣進行二次采樣非常有用。

PADDING

即使stride和kernel_size允許控制每個計算出的特征值有多大范圍，它們也有一個有害的、有時是無意的副作用，那就是縮小特征映射的總大小(卷積的輸出)。為了抵消這一點，輸入數據張量被人為地增加了長度(如果是一維、二維或三維)、高度(如果是二維或三維)和深度(如果是三維)，方法是在每個維度上附加和前置0。這意味著CNN將執(zhí)行更多的卷積，但是輸出形狀可以控制，而不會影響所需的核大小、步幅或擴展。圖4-11展示了正在運行的填充。

圖4-11 應用于高度和寬度等于的輸入矩陣的kernel_size=2的卷積2。但是，由于填充（用深灰色正方形表示），輸入矩陣的高度和寬度可以被放大。這通常與大小為3的內核一起使用，這樣輸出矩陣將等于輸入矩陣的大小。

DILATION

膨脹控制卷積核如何應用于輸入矩陣。在圖4-12中，我們顯示，將膨脹從1(默認值)增加到2意味著當應用于輸入矩陣時，核的元素彼此之間是兩個空格。另一種考慮這個問題的方法是在核中跨躍——在核中的元素或核的應用之間存在一個step size，即存在“holes”。這對于在不增加參數數量的情況下總結輸入空間的更大區(qū)域是有用的。當卷積層被疊加時，擴張卷積被證明是非常有用的。連續(xù)擴張的卷積指數級地增大了“接受域”的大?。患淳W絡在做出預測之前所看到的輸入空間的大小。

圖4-12 應用于超參數dilation=2的輸入矩陣的kernel_size=2的卷積。從默認值開始膨脹的增加意味著核矩陣的元素在與輸入矩陣相乘時進一步分散開來。進一步增大擴張會加劇這種擴散。

3.3 Implementing CNNs in PyTorch

在本節(jié)中，我們將通過端到端示例來利用上一節(jié)中介紹的概念。一般來說，神經網絡設計的目標是找到一個能夠完成任務的超參數組態(tài)。我們再次考慮在“示例:帶有多層感知器的姓氏分類”中引入的現(xiàn)在很熟悉的姓氏分類任務，但是我們將使用CNNs而不是MLP。我們仍然需要應用最后一個線性層，它將學會從一系列卷積層創(chuàng)建的特征向量創(chuàng)建預測向量。這意味著目標是確定卷積層的配置，從而得到所需的特征向量。所有CNN應用程序都是這樣的:首先有一組卷積層，它們提取一個feature map，然后將其作為上游處理的輸入。在分類中，上游處理幾乎總是應用線性(或fc)層。

本課程中的實現(xiàn)遍歷設計決策，以構建一個特征向量。我們首先構造一個人工數據張量，以反映實際數據的形狀。數據張量的大小是三維的——這是向量化文本數據的最小批大小。如果你對一個字符序列中的每個字符使用onehot向量，那么onehot向量序列就是一個矩陣，而onehot矩陣的小批量就是一個三維張量。使用卷積的術語，每個onehot(通常是詞匯表的大小)的大小是”input channels”的數量，字符序列的長度是“width”。

在例4-14中，構造特征向量的第一步是將PyTorch的Conv1d類的一個實例應用到三維數據張量。通過檢查輸出的大小，你可以知道張量減少了多少。建議參考圖4-9來直觀地解釋為什么輸出張量在收縮。

Example 4-14. Artificial data and using a Conv1d class

batch_size = 2

one_hot_size = 10

sequence_width = 7

data = torch.randn(batch_size, one_hot_size, sequence_width)

conv1 = nn.Conv1d(in_channels=one_hot_size, out_channels=16, kernel_size=3)

intermediate1 = conv1(data)

print(data.size())

print(intermediate1.size())

torch.Size([2, 10, 7])

torch.Size([2, 16, 5])

進一步減小輸出張量的主要方法有三種。第一種方法是創(chuàng)建額外的卷積并按順序應用它們。最終，對應的sequence_width (dim=2)維度的大小將為1。我們在例4-15中展示了應用兩個額外卷積的結果。一般來說，對輸出張量的約簡應用卷積的過程是迭代的，需要一些猜測工作。我們的示例是這樣構造的:經過三次卷積之后，最終的輸出在最終維度上的大小為1。

Example 4-15. The iterative application of convolutions to data

conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=3)

conv3 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3)

intermediate2 = conv2(intermediate1)

intermediate3 = conv3(intermediate2)

print(intermediate2.size())

print(intermediate3.size())

torch.Size([2, 32, 3])

torch.Size([2, 64, 1])

y_output = intermediate3.squeeze()

print(y_output.size())

torch.Size([2, 64])

intermediate2.mean(dim=0).mean(dim=1).sum()

tensor(2.4638, grad_fn=)

在每次卷積中，通道維數的大小都會增加，因為通道維數是每個數據點的特征向量。張量實際上是一個特征向量的最后一步是去掉討厭的尺寸=1維。您可以使用squeeze()方法來實現(xiàn)這一點。該方法將刪除size=1的所有維度并返回結果。然后，得到的特征向量可以與其他神經網絡組件(如線性層)一起使用來計算預測向量。

另外還有兩種方法可以將張量簡化為每個數據點的一個特征向量:將剩余的值壓平為特征向量，并在額外維度上求平均值。這兩種方法如示例4-16所示。使用第一種方法，只需使用PyTorch的view()方法將所有向量平展成單個向量。第二種方法使用一些數學運算來總結向量中的信息。最常見的操作是算術平均值，但沿feature map維數求和和使用最大值也是常見的。每種方法都有其優(yōu)點和缺點。扁平化保留了所有的信息，但會導致比預期(或計算上可行)更大的特征向量。平均變得與額外維度的大小無關，但可能會丟失信息。

Example 4-16. Two additional methods for reducing to feature vectors

# Method 2 of reducing to feature vectors

print(intermediate1.view(batch_size, -1).size())

# Method 3 of reducing to feature vectors

print(torch.mean(intermediate1, dim=2).size())

# print(torch.max(intermediate1, dim=2).size())

# print(torch.sum(intermediate1, dim=2).size())

torch.Size([2, 80])

torch.Size([2, 16])

這種設計一系列卷積的方法是基于經驗的:從數據的預期大小開始，處理一系列卷積，最終得到適合您的特征向量。雖然這種方法在實踐中效果很好，但在給定卷積的超參數和輸入張量的情況下，還有另一種計算張量輸出大小的方法，即使用從卷積運算本身推導出的數學公式。

3.4 Example: Classifying Surnames by Using a CNN

為了證明CNN的有效性，讓我們應用一個簡單的CNN模型來分類姓氏。這項任務的許多細節(jié)與前面的MLP示例相同，但真正發(fā)生變化的是模型的構造和向量化過程。模型的輸入，而不是我們在上一個例子中看到的收縮的onehot，將是一個onehot的矩陣。這種設計將使CNN能夠更好地“view”字符的排列，并對在“示例:帶有多層感知器的姓氏分類”中使用的收縮的onehot編碼中丟失的序列信息進行編碼。

3.4.1 The SurnameDataset

雖然姓氏數據集之前在“示例:帶有多層感知器的姓氏分類”中進行了描述，但建議參考“姓氏數據集”來了解它的描述。盡管我們使用了來自“示例:帶有多層感知器的姓氏分類”中的相同數據集，但在實現(xiàn)上有一個不同之處:數據集由onehot向量矩陣組成，而不是一個收縮的onehot向量。為此，我們實現(xiàn)了一個數據集類，它跟蹤最長的姓氏，并將其作為矩陣中包含的行數提供給矢量化器。列的數量是onehot向量的大小(詞匯表的大小)。示例4-17顯示了對SurnameDataset.__getitem__的更改;我們顯示對SurnameVectorizer的更改。在下一小節(jié)向量化。

我們使用數據集中最長的姓氏來控制onehot矩陣的大小有兩個原因。首先，將每一小批姓氏矩陣組合成一個三維張量，要求它們的大小相同。其次，使用數據集中最長的姓氏意味著可以以相同的方式處理每個小批處理。

Example 4-17. SurnameDataset modified for passing the maximum surname length

from argparse import Namespace

from collections import Counter

import json

import os

import string

import numpy as np

import pandas as pd

import torch

import torch.nn as nn

import torch.nn.functional as F

import torch.optim as optim

from torch.utils.data import Dataset, DataLoader

from tqdm import tqdm_notebook

class Vocabulary(object):

"""Class to process text and extract vocabulary for mapping"""

def __init__(self, token_to_idx=None, add_unk=True, unk_token=""):

"""

Args:

token_to_idx (dict): a pre-existing map of tokens to indices

add_unk (bool): a flag that indicates whether to add the UNK token

unk_token (str): the UNK token to add into the Vocabulary

"""

if token_to_idx is None:

token_to_idx = {}

self._token_to_idx = token_to_idx

self._idx_to_token = {idx: token

for token, idx in self._token_to_idx.items()}

self._add_unk = add_unk

self._unk_token = unk_token

self.unk_index = -1

if add_unk:

self.unk_index = self.add_token(unk_token)

def to_serializable(self):

""" returns a dictionary that can be serialized """

return {'token_to_idx': self._token_to_idx,

'add_unk': self._add_unk,

'unk_token': self._unk_token}

@classmethod

def from_serializable(cls, contents):

""" instantiates the Vocabulary from a serialized dictionary """

return cls(**contents)

def add_token(self, token):

"""Update mapping dicts based on the token.

Args:

token (str): the item to add into the Vocabulary

Returns:

index (int): the integer corresponding to the token

"""

try:

index = self._token_to_idx[token]

except KeyError:

index = len(self._token_to_idx)

self._token_to_idx[token] = index

self._idx_to_token[index] = token

return index

def add_many(self, tokens):

"""Add a list of tokens into the Vocabulary

Args:

tokens (list): a list of string tokens

Returns:

indices (list): a list of indices corresponding to the tokens

"""

return [self.add_token(token) for token in tokens]

def lookup_token(self, token):

"""Retrieve the index associated with the token

or the UNK index if token isn't present.

Args:

token (str): the token to look up

Returns:

index (int): the index corresponding to the token

Notes:

`unk_index` needs to be >=0 (having been added into the Vocabulary)

for the UNK functionality

"""

if self.unk_index >= 0:

return self._token_to_idx.get(token, self.unk_index)

else:

return self._token_to_idx[token]

def lookup_index(self, index):

"""Return the token associated with the index

Args:

index (int): the index to look up

Returns:

token (str): the token corresponding to the index

Raises:

KeyError: if the index is not in the Vocabulary

"""

if index not in self._idx_to_token:

raise KeyError("the index (%d) is not in the Vocabulary" % index)

return self._idx_to_token[index]

def __str__(self):

return "" % len(self)

def __len__(self):

return len(self._token_to_idx)

class SurnameDataset(Dataset):

def __init__(self, surname_df, vectorizer):

"""

Args:

name_df (pandas.DataFrame): the dataset

vectorizer (SurnameVectorizer): vectorizer instatiated from dataset

"""

self.surname_df = surname_df

self._vectorizer = vectorizer

self.train_df = self.surname_df[self.surname_df.split=='train']

self.train_size = len(self.train_df)

self.val_df = self.surname_df[self.surname_df.split=='val']

self.validation_size = len(self.val_df)

self.test_df = self.surname_df[self.surname_df.split=='test']

self.test_size = len(self.test_df)

self._lookup_dict = {'train': (self.train_df, self.train_size),

'val': (self.val_df, self.validation_size),

'test': (self.test_df, self.test_size)}

self.set_split('train')

# Class weights

class_counts = surname_df.nationality.value_counts().to_dict()

def sort_key(item):

return self._vectorizer.nationality_vocab.lookup_token(item[0])

sorted_counts = sorted(class_counts.items(), key=sort_key)

frequencies = [count for _, count in sorted_counts]

self.class_weights = 1.0 / torch.tensor(frequencies, dtype=torch.float32)

@classmethod

def load_dataset_and_make_vectorizer(cls, surname_csv):

"""Load dataset and make a new vectorizer from scratch

Args:

surname_csv (str): location of the dataset

Returns:

an instance of SurnameDataset

"""

surname_df = pd.read_csv(surname_csv)

train_surname_df = surname_df[surname_df.split=='train']

return cls(surname_df, SurnameVectorizer.from_dataframe(train_surname_df))

@classmethod

def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):

"""Load dataset and the corresponding vectorizer.

Used in the case in the vectorizer has been cached for re-use

Args:

surname_csv (str): location of the dataset

vectorizer_filepath (str): location of the saved vectorizer

Returns:

an instance of SurnameDataset

"""

surname_df = pd.read_csv(surname_csv)

vectorizer = cls.load_vectorizer_only(vectorizer_filepath)

return cls(surname_df, vectorizer)

@staticmethod

def load_vectorizer_only(vectorizer_filepath):

"""a static method for loading the vectorizer from file

Args:

vectorizer_filepath (str): the location of the serialized vectorizer

Returns:

an instance of SurnameDataset

"""

with open(vectorizer_filepath) as fp:

return SurnameVectorizer.from_serializable(json.load(fp))

def save_vectorizer(self, vectorizer_filepath):

"""saves the vectorizer to disk using json

Args:

vectorizer_filepath (str): the location to save the vectorizer

"""

with open(vectorizer_filepath, "w") as fp:

json.dump(self._vectorizer.to_serializable(), fp)

def get_vectorizer(self):

""" returns the vectorizer """

return self._vectorizer

def set_split(self, split="train"):

""" selects the splits in the dataset using a column in the dataframe """

self._target_split = split

self._target_df, self._target_size = self._lookup_dict[split]

def __len__(self):

return self._target_size

def __getitem__(self, index):

"""the primary entry point method for PyTorch datasets

Args:

index (int): the index to the data point

Returns:

a dictionary holding the data point's features (x_data) and label (y_target)

"""

row = self._target_df.iloc[index]

surname_matrix = \

self._vectorizer.vectorize(row.surname)

nationality_index = \

self._vectorizer.nationality_vocab.lookup_token(row.nationality)

return {'x_surname': surname_matrix,

'y_nationality': nationality_index}

def get_num_batches(self, batch_size):

"""Given a batch size, return the number of batches in the dataset

Args:

batch_size (int)

Returns:

number of batches in the dataset

"""

return len(self) // batch_size

def generate_batches(dataset, batch_size, shuffle=True,

drop_last=True, device="cpu"):

"""

A generator function which wraps the PyTorch DataLoader. It will

ensure each tensor is on the write device location.

"""

dataloader = DataLoader(dataset=dataset, batch_size=batch_size,

shuffle=shuffle, drop_last=drop_last)

for data_dict in dataloader:

out_data_dict = {}

for name, tensor in data_dict.items():

out_data_dict[name] = data_dict[name].to(device)

yield out_data_dict

3.4.2 Vocabulary, Vectorizer, and DataLoader

在本例中，盡管詞匯表和DataLoader的實現(xiàn)方式與“示例:帶有多層感知器的姓氏分類”中的示例相同，但Vectorizer的vectorize()方法已經更改，以適應CNN模型的需要。具體來說，正如我們在示例4-18中的代碼中所示，該函數將字符串中的每個字符映射到一個整數，然后使用該整數構造一個由onehot向量組成的矩陣。重要的是，矩陣中的每一列都是不同的onehot向量。主要原因是，我們將使用的Conv1d層要求數據張量在第0維上具有批處理，在第1維上具有通道，在第2維上具有特性。

除了更改為使用onehot矩陣之外，我們還修改了矢量化器，以便計算姓氏的最大長度并將其保存為max_surname_length

Example 4-18. Implementing the Surname Vectorizer for CNNs

class SurnameVectorizer(object):

""" The Vectorizer which coordinates the Vocabularies and puts them to use"""

def __init__(self, surname_vocab, nationality_vocab, max_surname_length):

"""

Args:

surname_vocab (Vocabulary): maps characters to integers

nationality_vocab (Vocabulary): maps nationalities to integers

max_surname_length (int): the length of the longest surname

"""

self.surname_vocab = surname_vocab

self.nationality_vocab = nationality_vocab

self._max_surname_length = max_surname_length

def vectorize(self, surname):

"""

Args:

surname (str): the surname

Returns:

one_hot_matrix (np.ndarray): a matrix of one-hot vectors

"""

one_hot_matrix_size = (len(self.surname_vocab), self._max_surname_length)

one_hot_matrix = np.zeros(one_hot_matrix_size, dtype=np.float32)

for position_index, character in enumerate(surname):

character_index = self.surname_vocab.lookup_token(character)

one_hot_matrix[character_index][position_index] = 1

return one_hot_matrix

@classmethod

def from_dataframe(cls, surname_df):

"""Instantiate the vectorizer from the dataset dataframe

Args:

surname_df (pandas.DataFrame): the surnames dataset

Returns:

an instance of the SurnameVectorizer

"""

surname_vocab = Vocabulary(unk_token="@")

nationality_vocab = Vocabulary(add_unk=False)

max_surname_length = 0

for index, row in surname_df.iterrows():

max_surname_length = max(max_surname_length, len(row.surname))

for letter in row.surname:

surname_vocab.add_token(letter)

nationality_vocab.add_token(row.nationality)

return cls(surname_vocab, nationality_vocab, max_surname_length)

@classmethod

def from_serializable(cls, contents):

surname_vocab = Vocabulary.from_serializable(contents['surname_vocab'])

nationality_vocab = Vocabulary.from_serializable(contents['nationality_vocab'])

return cls(surname_vocab=surname_vocab, nationality_vocab=nationality_vocab,

max_surname_length=contents['max_surname_length'])

def to_serializable(self):

return {'surname_vocab': self.surname_vocab.to_serializable(),

'nationality_vocab': self.nationality_vocab.to_serializable(),

'max_surname_length': self._max_surname_length}

3.4.3 Reimplementing the SurnameClassifier with Convolutional Networks

我們在本例中使用的模型是使用我們在“卷積神經網絡”中介紹的方法構建的。實際上，我們在該部分中創(chuàng)建的用于測試卷積層的“人工”數據與姓氏數據集中使用本例中的矢量化器的數據張量的大小完全匹配。正如在示例4-19中所看到的，它與我們在“卷積神經網絡”中引入的Conv1d序列既有相似之處，也有需要解釋的新添加內容。具體來說，該模型類似于“卷積神經網絡”，它使用一系列一維卷積來增量地計算更多的特征，從而得到一個單特征向量。

然而，本例中的新內容是使用sequence和ELU PyTorch模塊。序列模塊是封裝線性操作序列的方便包裝器。在這種情況下，我們使用它來封裝Conv1d序列的應用程序。ELU是類似于實驗3中介紹的ReLU的非線性函數，但是它不是將值裁剪到0以下，而是對它們求冪。ELU已經被證明是卷積層之間使用的一種很有前途的非線性(Clevert et al.， 2015)。

在本例中，我們將每個卷積的通道數與num_channels超參數綁定。我們可以選擇不同數量的通道分別進行卷積運算。這樣做需要優(yōu)化更多的超參數。我們發(fā)現(xiàn)256足夠大，可以使模型達到合理的性能。

Example 4-19. The CNN-based SurnameClassifier

class SurnameClassifier(nn.Module):

def __init__(self, initial_num_channels, num_classes, num_channels):

"""

Args:

initial_num_channels (int): size of the incoming feature vector

num_classes (int): size of the output prediction vector

num_channels (int): constant channel size to use throughout network

"""

super(SurnameClassifier, self).__init__()

self.convnet = nn.Sequential(

nn.Conv1d(in_channels=initial_num_channels,

out_channels=num_channels, kernel_size=3),

nn.ELU(),

nn.Conv1d(in_channels=num_channels, out_channels=num_channels,

kernel_size=3, stride=2),

nn.ELU(),

nn.Conv1d(in_channels=num_channels, out_channels=num_channels,

kernel_size=3, stride=2),

nn.ELU(),

nn.Conv1d(in_channels=num_channels, out_channels=num_channels,

kernel_size=3),

nn.ELU()

)

self.fc = nn.Linear(num_channels, num_classes)

def forward(self, x_surname, apply_softmax=False):

"""The forward pass of the classifier

Args:

x_surname (torch.Tensor): an input data tensor.

x_surname.shape should be (batch, initial_num_channels, max_surname_length)

apply_softmax (bool): a flag for the softmax activation

should be false if used with the Cross Entropy losses

Returns:

the resulting tensor. tensor.shape should be (batch, num_classes)

"""

features = self.convnet(x_surname).squeeze(dim=2)

prediction_vector = self.fc(features)

if apply_softmax:

prediction_vector = F.softmax(prediction_vector, dim=1)

return prediction_vector

3.4.4 The Training Routine

訓練程序包括以下似曾相識的的操作序列:實例化數據集,實例化模型,實例化損失函數,實例化優(yōu)化器,遍歷數據集的訓練分區(qū)和更新模型參數,遍歷數據集的驗證分區(qū)和測量性能,然后重復數據集迭代一定次數。此時，這是本書到目前為止的第三個訓練例程實現(xiàn)，應該將這個操作序列內部化。對于這個例子，我們將不再詳細描述具體的訓練例程，因為它與“示例:帶有多層感知器的姓氏分類”中的例程完全相同。但是，輸入參數是不同的，可以在示例4-20中看到。

Example 4-20. Input arguments to the CNN surname classifier

def make_train_state(args):

return {'stop_early': False,

'early_stopping_step': 0,

'early_stopping_best_val': 1e8,

'learning_rate': args.learning_rate,

'epoch_index': 0,

'train_loss': [],

'train_acc': [],

'val_loss': [],

'val_acc': [],

'test_loss': -1,

'test_acc': -1,

'model_filename': args.model_state_file}

def update_train_state(args, model, train_state):

"""Handle the training state updates.

Components:

- Early Stopping: Prevent overfitting.

- Model Checkpoint: Model is saved if the model is better

:param args: main arguments

:param model: model to train

:param train_state: a dictionary representing the training state values

:returns:

a new train_state

"""

# Save one model at least

if train_state['epoch_index'] == 0:

torch.save(model.state_dict(), train_state['model_filename'])

train_state['stop_early'] = False

# Save model if performance improved

elif train_state['epoch_index'] >= 1:

loss_tm1, loss_t = train_state['val_loss'][-2:]

# If loss worsened

if loss_t >= train_state['early_stopping_best_val']:

# Update step

train_state['early_stopping_step'] += 1

# Loss decreased

else:

# Save the best model

if loss_t < train_state['early_stopping_best_val']:

torch.save(model.state_dict(), train_state['model_filename'])

# Reset early stopping step

train_state['early_stopping_step'] = 0

# Stop early ?

train_state['stop_early'] = \

train_state['early_stopping_step'] >= args.early_stopping_criteria

return train_state

def compute_accuracy(y_pred, y_target):

y_pred_indices = y_pred.max(dim=1)[1]

n_correct = torch.eq(y_pred_indices, y_target).sum().item()

return n_correct / len(y_pred_indices) * 100

args = Namespace(

# Data and Path information

surname_csv="data/surnames/surnames_with_splits.csv",

vectorizer_file="vectorizer.json",

model_state_file="model.pth",

save_dir="model_storage/ch4/cnn",

# Model hyper parameters

hidden_dim=100,

num_channels=256,

# Training hyper parameters

seed=1337,

learning_rate=0.001,

batch_size=128,

num_epochs=100,

early_stopping_criteria=5,

dropout_p=0.1,

# Runtime options

cuda=False,

reload_from_files=False,

expand_filepaths_to_save_dir=True,

catch_keyboard_interrupt=True

)

if args.expand_filepaths_to_save_dir:

args.vectorizer_file = os.path.join(args.save_dir,

args.vectorizer_file)

args.model_state_file = os.path.join(args.save_dir,

args.model_state_file)

print("Expanded filepaths: ")

print("\t{}".format(args.vectorizer_file))

print("\t{}".format(args.model_state_file))

# Check CUDA

if not torch.cuda.is_available():

args.cuda = False

args.device = torch.device("cuda" if args.cuda else "cpu")

print("Using CUDA: {}".format(args.cuda))

def set_seed_everywhere(seed, cuda):

np.random.seed(seed)

torch.manual_seed(seed)

if cuda:

torch.cuda.manual_seed_all(seed)

def handle_dirs(dirpath):

if not os.path.exists(dirpath):

os.makedirs(dirpath)

# Set seed for reproducibility

set_seed_everywhere(args.seed, args.cuda)

# handle dirs

handle_dirs(args.save_dir)

Expanded filepaths:

model_storage/ch4/cnn/vectorizer.json

model_storage/ch4/cnn/model.pth

Using CUDA: False

if args.reload_from_files:

# training from a checkpoint

dataset = SurnameDataset.load_dataset_and_load_vectorizer(args.surname_csv,

args.vectorizer_file)

else:

# create dataset and vectorizer

dataset = SurnameDataset.load_dataset_and_make_vectorizer(args.surname_csv)

dataset.save_vectorizer(args.vectorizer_file)

vectorizer = dataset.get_vectorizer()

classifier = SurnameClassifier(initial_num_channels=len(vectorizer.surname_vocab),

num_classes=len(vectorizer.nationality_vocab),

num_channels=args.num_channels)

classifer = classifier.to(args.device)

dataset.class_weights = dataset.class_weights.to(args.device)

loss_func = nn.CrossEntropyLoss(weight=dataset.class_weights)

optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,

mode='min', factor=0.5,

patience=1)

train_state = make_train_state(args)

epoch_bar = tqdm_notebook(desc='training routine',

total=args.num_epochs,

position=0)

dataset.set_split('train')

train_bar = tqdm_notebook(desc='split=train',

total=dataset.get_num_batches(args.batch_size),

position=1,

leave=True)

dataset.set_split('val')

val_bar = tqdm_notebook(desc='split=val',

total=dataset.get_num_batches(args.batch_size),

position=1,

leave=True)

try:

for epoch_index in range(args.num_epochs):

train_state['epoch_index'] = epoch_index

# Iterate over training dataset

# setup: batch generator, set loss and acc to 0, set train mode on

dataset.set_split('train')

batch_generator = generate_batches(dataset,

batch_size=args.batch_size,

device=args.device)

running_loss = 0.0

running_acc = 0.0

classifier.train()

for batch_index, batch_dict in enumerate(batch_generator):

# the training routine is these 5 steps:

# --------------------------------------

# step 1. zero the gradients

optimizer.zero_grad()

# step 2. compute the output

y_pred = classifier(batch_dict['x_surname'])

# step 3. compute the loss

loss = loss_func(y_pred, batch_dict['y_nationality'])

loss_t = loss.item()

running_loss += (loss_t - running_loss) / (batch_index + 1)

# step 4. use loss to produce gradients

loss.backward()

# step 5. use optimizer to take gradient step

optimizer.step()

# -----------------------------------------

# compute the accuracy

acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])

running_acc += (acc_t - running_acc) / (batch_index + 1)

# update bar

train_bar.set_postfix(loss=running_loss, acc=running_acc,

epoch=epoch_index)

train_bar.update()

train_state['train_loss'].append(running_loss)

train_state['train_acc'].append(running_acc)

# Iterate over val dataset

# setup: batch generator, set loss and acc to 0; set eval mode on

dataset.set_split('val')

batch_generator = generate_batches(dataset,

batch_size=args.batch_size,

device=args.device)

running_loss = 0.

running_acc = 0.

classifier.eval()

for batch_index, batch_dict in enumerate(batch_generator):

# compute the output

y_pred = classifier(batch_dict['x_surname'])

# step 3. compute the loss

loss = loss_func(y_pred, batch_dict['y_nationality'])

loss_t = loss.item()

running_loss += (loss_t - running_loss) / (batch_index + 1)

# compute the accuracy

acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])

running_acc += (acc_t - running_acc) / (batch_index + 1)

val_bar.set_postfix(loss=running_loss, acc=running_acc,

epoch=epoch_index)

val_bar.update()

train_state['val_loss'].append(running_loss)

train_state['val_acc'].append(running_acc)

train_state = update_train_state(args=args, model=classifier,

train_state=train_state)

scheduler.step(train_state['val_loss'][-1])

if train_state['stop_early']:

break

train_bar.n = 0

val_bar.n = 0

epoch_bar.update()

except KeyboardInterrupt:

print("Exiting loop")

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:3: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`

This is separate from the ipykernel package so we can avoid doing imports until

HBox(children=(FloatProgress(value=0.0, description='training routine', style=ProgressStyle(description_width=…

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:9: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`

if __name__ == '__main__':

HBox(children=(FloatProgress(value=0.0, description='split=train', max=60.0, style=ProgressStyle(description_w…

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:14: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`

HBox(children=(FloatProgress(value=0.0, description='split=val', max=12.0, style=ProgressStyle(description_wid…

classifier.load_state_dict(torch.load(train_state['model_filename']))

classifier = classifier.to(args.device)

dataset.class_weights = dataset.class_weights.to(args.device)

loss_func = nn.CrossEntropyLoss(dataset.class_weights)

dataset.set_split('test')

batch_generator = generate_batches(dataset,

batch_size=args.batch_size,

device=args.device)

running_loss = 0.

running_acc = 0.

classifier.eval()

for batch_index, batch_dict in enumerate(batch_generator):

# compute the output

y_pred = classifier(batch_dict['x_surname'])

# compute the loss

loss = loss_func(y_pred, batch_dict['y_nationality'])

loss_t = loss.item()

running_loss += (loss_t - running_loss) / (batch_index + 1)

# compute the accuracy

acc_t = compute_accuracy(y_pred, batch_dict['y_nationality'])

running_acc += (acc_t - running_acc) / (batch_index + 1)

train_state['test_loss'] = running_loss

train_state['test_acc'] = running_acc

print("Test loss: {};".format(train_state['test_loss']))

print("Test Accuracy: {}".format(train_state['test_acc']))

Test loss: 1.9216371824343998;

Test Accuracy: 60.7421875

3.4.5 Model Evaluation and Prediction

要理解模型的性能，需要對性能進行定量和定性的度量。下面將描述這兩個度量的基本組件。建議你擴展它們，以探索該模型及其所學習到的內容。

Evaluating on the Test Dataset 正如“示例:帶有多層感知器的姓氏分類”中的示例與本示例之間的訓練例程沒有變化一樣，執(zhí)行評估的代碼也沒有變化。總之，調用分類器的eval()方法來防止反向傳播，并迭代測試數據集。與 MLP 約 50% 的性能相比，該模型的測試集性能準確率約為56%。盡管這些性能數字絕不是這些特定架構的上限，但是通過一個相對簡單的CNN模型獲得的改進應該足以讓您在文本數據上嘗試CNNs。

Classifying or retrieving top predictions for a new surname

在本例中，predict_nationality()函數的一部分發(fā)生了更改，如示例4-21所示:我們沒有使用視圖方法重塑新創(chuàng)建的數據張量以添加批處理維度，而是使用PyTorch的unsqueeze()函數在批處理應該在的位置添加大小為1的維度。相同的更改反映在predict_topk_nationality()函數中。

Example 4-21. Using the trained model to make predictions

def predict_nationality(surname, classifier, vectorizer):

"""Predict the nationality from a new surname

Args:

surname (str): the surname to classifier

classifier (SurnameClassifer): an instance of the classifier

vectorizer (SurnameVectorizer): the corresponding vectorizer

Returns:

a dictionary with the most likely nationality and its probability

"""

vectorized_surname = vectorizer.vectorize(surname)

vectorized_surname = torch.tensor(vectorized_surname).unsqueeze(0)

result = classifier(vectorized_surname, apply_softmax=True)

probability_values, indices = result.max(dim=1)

index = indices.item()

predicted_nationality = vectorizer.nationality_vocab.lookup_index(index)

probability_value = probability_values.item()

return {'nationality': predicted_nationality, 'probability': probability_value}

new_surname = input("Enter a surname to classify: ")

classifier = classifier.cpu()

prediction = predict_nationality(new_surname, classifier, vectorizer)

print("{} -> {} (p={:0.2f})".format(new_surname,

prediction['nationality'],

prediction['probability']))

Enter a surname to classify: Mchan

Mchan -> Irish (p=0.30)

def predict_topk_nationality(surname, classifier, vectorizer, k=5):

"""Predict the top K nationalities from a new surname

Args:

surname (str): the surname to classifier

classifier (SurnameClassifer): an instance of the classifier

vectorizer (SurnameVectorizer): the corresponding vectorizer

k (int): the number of top nationalities to return

Returns:

list of dictionaries, each dictionary is a nationality and a probability

"""

vectorized_surname = vectorizer.vectorize(surname)

vectorized_surname = torch.tensor(vectorized_surname).unsqueeze(dim=0)

prediction_vector = classifier(vectorized_surname, apply_softmax=True)

probability_values, indices = torch.topk(prediction_vector, k=k)

# returned size is 1,k

probability_values = probability_values[0].detach().numpy()

indices = indices[0].detach().numpy()

results = []

for kth_index in range(k):

nationality = vectorizer.nationality_vocab.lookup_index(indices[kth_index])

probability_value = probability_values[kth_index]

results.append({'nationality': nationality,

'probability': probability_value})

return results

new_surname = input("Enter a surname to classify: ")

k = int(input("How many of the top predictions to see? "))

if k > len(vectorizer.nationality_vocab):

print("Sorry! That's more than the # of nationalities we have.. defaulting you to max size :)")

k = len(vectorizer.nationality_vocab)

predictions = predict_topk_nationality(new_surname, classifier, vectorizer, k=k)

print("Top {} predictions:".format(k))

print("===================")

for prediction in predictions:

print("{} -> {} (p={:0.2f})".format(new_surname,

prediction['nationality'],

prediction['probability']))

Enter a surname to classify: Mchan

How many of the top predictions to see? 5

Top 5 predictions:

===================

Mchan -> Irish (p=0.30)

Mchan -> German (p=0.28)

Mchan -> English (p=0.19)

Mchan -> Scottish (p=0.09)

Mchan -> Russian (p=0.06)

3.5 Miscellaneous Topics in CNNs

為了結束我們的討論，我們概述了幾個其他的主題，這些主題是CNNs的核心，但在它們的共同使用中起著主要作用。特別是，你將看到Pooling操作、batch Normalization、network-in-network connection和residual connections的描述。

3.5.1 Pooling Operation

Pooling是將高維特征映射總結為低維特征映射的操作。卷積的輸出是一個特征映射。feature map中的值總結了輸入的一些區(qū)域。由于卷積計算的重疊性，許多計算出的特征可能是冗余的。Pooling是一種將高維(可能是冗余的)特征映射總結為低維特征映射的方法。在形式上，池是一種像sum、mean或max這樣的算術運算符，系統(tǒng)地應用于feature map中的局部區(qū)域，得到的池操作分別稱為sum pooling、average pooling和max pooling。池還可以作為一種方法，將較大但較弱的feature map的統(tǒng)計強度改進為較小但較強的feature map。圖4-13說明了Pooling。

圖4-13 這里所示的池操作在功能上與卷積相同：它應用于輸入矩陣中的不同位置。然而，池操作不是將輸入矩陣的值相乘和求和，而是應用一些函數G來匯集這些值。G可以是任何運算，但求和、求最大值和計算平均值是最常見的。

3.5.2 Batch Normalization (BatchNorm)

批處理標準化是設計網絡時經常使用的一種工具。BatchNorm對CNN的輸出進行轉換，方法是將激活量縮放為零均值和單位方差。它用于Z-transform的平均值和方差值每批更新一次，這樣任何單個批中的波動都不會太大地移動或影響它。BatchNorm允許模型對參數的初始化不那么敏感，并且簡化了學習速率的調整(Ioffe and Szegedy, 2015)。在PyTorch中，批處理規(guī)范是在nn模塊中定義的。例4-22展示了如何用卷積和線性層實例化和使用批處理規(guī)范。

Example 4-22. Using s Conv1D layer with batch normalization.

conv1 = nn.Conv1d(in_channels=one_hot_size, out_channels=16, kernel_size=3) # 輸出通道數為16（即該層卷積核的數量），卷積核大小為3。

conv2 = nn.Conv1d(in_channels=16, out_channels=32, kernel_size=3) # 第二個一維卷積層，輸入通道數為16

conv3 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3) # 第三個一維卷積層，輸入通道數為32

conv1_bn = nn.BatchNorm1d(num_features=16)

conv2_bn = nn.BatchNorm1d(num_features=32)

intermediate1 = conv1_bn(F.relu(conv1(data)))

intermediate2 = conv2_bn(F.relu(conv2(intermediate1)))

intermediate3 = conv3(intermediate2)

print(intermediate1.size())

print(intermediate2.size())

print(intermediate3.size())

torch.Size([2, 16, 5])

torch.Size([2, 32, 3])

torch.Size([2, 64, 1])

Note: BatchNorm computes its statistics over the batch and sequence dimensions. In other words, the input to each batchnorm1d is a tensor of size (B, C, L) (where b=batch, c=channels, and l=length). Each (B, L) slice should have 0-mean. This reduces covariate shift.

intermediate2.mean(dim=(0, 2))# 計算intermediate2沿著批處理大小維度和序列長度維度的平均值

tensor([ 2.9802e-08, 3.9736e-08, -7.4506e-09, -9.9341e-09, -9.9341e-09,

2.9802e-08, -1.2418e-09, 3.4769e-08, -9.9341e-09, 7.7610e-11,

-4.9671e-09, 9.9341e-09, 4.9671e-09, -1.4901e-08, 1.9868e-08,

2.4835e-09, -3.9736e-08, 2.4835e-09, 9.9341e-09, -1.9868e-08,

-2.9802e-08, -4.9671e-09, -4.9671e-09, -9.9341e-09, 9.9341e-09,

-4.9671e-09, -1.9868e-08, 1.8626e-09, -3.5390e-08, -6.2088e-09,

0.0000e+00, 0.0000e+00], grad_fn=)

3.5.3 Network-in-Network Connections (1x1 Convolutions)

Network-in-Network (NiN)連接是具有kernel_size=1的卷積內核，具有一些有趣的特性。具體來說，1x1卷積就像通道之間的一個完全連通的線性層。這在從多通道feature map映射到更淺的feature map時非常有用。在圖4-14中，我們展示了一個應用于輸入矩陣的NiN連接。它將兩個通道簡化為一個通道。因此，NiN或1x1卷積提供了一種廉價的方法來合并參數較少的額外非線性(Lin et al.， 2013)。

圖4-14 一個1×1卷積運算的例子。觀察1×1卷積是如何進行的操作將通道數從兩個減少到一個。

3.5.4 Residual Connections/Residual Block

CNNs中最重要的趨勢之一是Residual connection，它支持真正深層的網絡(超過100層)。它也稱為skip connection。如果將卷積函數表示為conv，則residual block的輸出如下:

(

)

output = conv ( input ) + input

output=conv(input)+input

然而，這個操作有一個隱含的技巧，如圖4-15所示。對于要添加到卷積輸出的輸入，它們必須具有相同的形狀。為此，標準做法是在卷積之前應用填充。在圖4-15中，填充尺寸為1，卷積大小為3。

圖4-15 殘差連接是一種將原始矩陣加到卷積輸出上的方法。當將卷積層應用于輸入矩陣并將結果添加到輸入矩陣時，以上直觀地描述了這一點。創(chuàng)建與輸入大小相同的輸出的通用超參數設置是讓kernel_size=3和padding=1。一般來說，任何帶 adding=(floor(kernel_size)/2-1) 的奇數內核大小都將導致與輸入大小相同的輸出。關于填充和卷曲的直觀說明，請參見圖4-11。卷積層產生的矩陣被加到輸入端，最后的結果是剩余連接計算的輸出端。

柚子快報激活碼778899分享：人工智能自然語言處理前饋網絡

http://yzkb.51969.com/

參考文章

評論可見，查看隱藏內容

本文內容根據網絡資料整理，出于傳遞更多信息之目的，不代表金鑰匙跨境贊同其觀點和立場。

轉載請注明，如有侵權，聯(lián)系刪除。

本文鏈接：http://gantiao.com.cn/post/19229155.html