Pytorch – 内置的LSTM网络torch.nn.LSTM参数详解与使用示例-StubbornHuang Blog

1 torch.nn.LSTM

torch.nn.LSTM是pytorch内置的LSTM模块。

对于torch.nn.LSTM输入序列的每一个元素，都使用以下经典的LSTM计算过程:

\begin{array}{c} i_{t}=\sigma\left(W_{i i} x_{t}+b_{i i}+W_{h i} h_{t-1}+b_{h i}\right) \\ f_{t}=\sigma\left(W_{i f} x_{t}+b_{i f}+W_{h f} h_{t-1}+b_{h f}\right) \\ g_{t}=\tanh \left(W_{i g} x_{t}+b_{i g}+W_{h g} h_{t-1}+b_{h g}\right) \\ o_{t}=\sigma\left(W_{i o} x_{t}+b_{i o}+W_{h o} h_{t-1}+b_{h o}\right) \\ c_{t}=f_{t} \odot c_{t-1}+i_{t} \odot g_{t} \\ h_{t}=o_{t} \odot \tanh \left(c_{t}\right) \end{array}

其中， $h_{t}$ 是时间 $t$ 的隐藏状态， $c_{t}$ 是时间 $t$ 的细胞状态， $x_{t}$ 是时间 $t$ 的输入， $h_{t-1}$ 是时间 $t-1$ 的隐藏状态或者时间 $o$ 的初始隐藏状态。 $i_{t}$ 为输入门， $f_{t}$ 为遗忘门， $g_{t}$ 为单元门， $o_{t}$ 为输出门， $\sigma$ 是 $sigmoid$ 函数， $\odot$ 为哈达玛积。

在多层LSTM中，输入 $x_{t}^{(l)}$ 的第 $l$ 层（ $l\ge 2$ ）是隐藏状态 $h_{t}^{(l-1)}$ 的上一层乘以 $dropout\delta _{t}^{(l-1)}$ ，其中每一个 $\delta _{t}^{(l-1)}$ 是一个伯努利随机变量。

如果设置 $projsize > 0$ ，那么将会使用带投影的LSTM。这将会通过以下方式改变LSTM单元。

首先， $h_{t}$ 的维度将从hidden_size修改为proj_size（ $W_{hi}$ 的维度也会改变）；

其次，每一层的输出隐藏状态将乘以一个可学习的投影矩阵： $h_{t}=W_{hr}h_{t}$ ，因此LSTM网络的输出也将会有不同的形状。有关形状变化的信息，可参考论文https://arxiv.org/abs/1402.1128。

1.1 创建torch.nn.LSTM

函数形式

torch.nn.LSTM(*args, **kwargs)

函数参数

input_size：输入 $x$ 的期望特征维度，指输入数据的大小，整个LSTM网络的输入为input(seq_len,batch,input_size)，那么input_size参数就决定了每一个词的维度；
hidden_size：隐藏状态 $h$ 的特征维度；
num_layer：循环层数，默认值为1。例如设置为2，则将堆叠两个LSTM形成一个LSTM，第二个LSTM接收第一个LSTM的输出并计算最终结果；
bias：默认为True。如果为False，则该层不使用偏差权重b_ih和b_hh；
batch_first：默认为False。如果为True，则输入和输出的形状从(seq,batch,feature)调整为(batch,seq,feature)；
dropout：默认值为0。如果为非0值，则在除最后一层之外的每个LSTM层的输出上都加入Dropout层，dropout的概率为设置的非0值；
bidirectional：默认值为False。如果设置为True，则为双向LSTM；
proj_size：默认值为0。如果设置为非0值，则使用具有相应大小的投影LSTM；

1.2 使用torch.nn.LSTM

函数形式

output,(h_n,c_n) = LSTM(input,(h_0,c_0))

函数输入

input：非batch输入的Tensor形状为 $(L,H_{in})$ ，当batch_first = False时，batch输入的Tensor形状为 $(L,N,H_{in})$ ，当batch_first = True时，batch输入的Tensor形状为 $(N,L,H_{in})$ 。input包含了输入序列的特征。input也可以是压缩的可变长度序列，详细信息可参考torch.nn.utils.rnn.pack_padded_sequence()和torch.nn.utils.rnn.pack_sequence()。
h_0：对于非batch输入的Tensor形状为 $(D * numlayers,H_{out})$ ，batch输入的Tensor的形状为 $(D*numlayers,N,H_{out})$ ，输出包含了序列中每个元素的最终隐藏状态。
c_0：对于非batch输入的Tensor形状为 $(D * numlayers,H_{cell})$ ，batch输入的Tensor的形状为 $(D*numlayers,N,H_{cell})$ ，输出包含了序列中每个元素的最终单元状态。如果没有提供(h_0,c_0)，则默认为(0,0)。

上述的数学符号对应的含义为：

\begin{aligned} N &=\text { batch size } \\ L &=\text { sequence length } \\ D &=2 \text { if bidirectional }=\text { True otherwise } 1 \\ H_{\text {in }} &=\text { input size } \\ H_{\text {cell }} &=\text { hidden size } \\ H_{\text {out }} &=\text { projsize if projsize }>0 \text { otherwise hidden size } \end{aligned}

综上所述，如果是batch输出，torch.nn.LSTM的输入数据维度为，

input：(seq_len , batch_size , input_size)
h_0：(num_directions * num_layers , batch_size，hidden_size )
c_0：(num_directions * num_layers , batch_size , hidden_size)

我们可以以下面的这种方式来理解torch.nn.LSTM的输入input，

seq_len为序列的个数，如果是文章，就是每一个句子的长度，一般来讲，这个长度是固定的；如果是股票数据，则表示特定的时间内，有多少条数据。这个参数也明确了有多少单元来处理输入的数据。
batch_size为输入的个数，如果是文章，即输入多少条句子；如果是股票数据，则表示多少个特定时间单位的数据；
input_size：输入元素的维度，如果是文章，则表明句子里面的词用多少维向量进行表示；如果是股票数据，则特定时间单位内，某个具体的时刻应该采集多少具体的值，比如最低价、最高价、均价、5日均价、10日均价等。

函数输出

output：非batch输入的Tensor形状为 $(L,D * H_{out})$ ，当batch_first = False时，batch输入的Tensor形状为 $(L,N,D * H_{out})$ ，当batch_first = True时，batch输入的Tensor形状为 $(N,L,D * H_{out})$ 。对于每一个时间 $t$ ，包含LSTM最后一层的输出特征。如果LSTM将torch.nn.utils.rnn.pack_sequence()作为输入，则输出也将是一个压缩序列。
h_n：非bacth输入的Tensor形状为 $(D * numlayers,H_{out})$ ，batch输入的Tensor形状为 $(D * numlayers,N,H_{out})$ ，包含了序列中每个元素的最终隐藏状态。
c_n：c_0：对于非batch输入的Tensor形状为 $(D * numlayers,H_{cell})$ ，batch输入的Tensor的形状为 $(D * numlayers,N,H_{cell})$ ，输出包含了序列中每个元素的最终单元状态。

综上所述，如果是batch输出，则torch.nn.LSTM的输出数据维度为，

output：(seq_len , batch_size , num_directions * hidden_size)
h_0：(num_directions * num_layers , batch_size，hidden_size )
c_0：(num_directions * num_layers , batch_size , hidden_size)

1.3 torch.nn.LSTM使用示例

假设一句话有5个单词，每个单词需要用10维的向量表示，batch_size为8，则

# -*- coding: utf-8 -*-

import torch
import torch.nn as nn

if __name__ == '__main__':
    lstm_input = torch.randn(5,8,10) # lstm_input => (seq_len = 5,batch_size = 8,input_size = 10)

    lstm = nn.LSTM(10,20,1) # (input_size=10,hidden_size = 20,num_layers = 1)

    out,(h_n,c_n) = lstm(lstm_input) # out =>(seq_len = 5,batch_size = 8,D * hidden_size = 1 * 20)

    print(out.shape)

输出

torch.Size([5, 8, 20])

1.4 使用LSTM进行mnist数据集图像分类

import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms


# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters
sequence_length = 28
input_size = 28
hidden_size = 128
num_layers = 2
num_classes = 10
batch_size = 100
num_epochs = 2
learning_rate = 0.01

# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False, 
                                          transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

# Recurrent neural network (many-to-one)
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) 
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

        # Forward propagate LSTM
        out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size)

        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

model = RNN(input_size, hidden_size, num_layers, num_classes).to(device)


# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test the model
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, sequence_length, input_size).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total)) 

# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')

输出

Done!
Epoch [1/2], Step [100/600], Loss: 0.5480
Epoch [1/2], Step [200/600], Loss: 0.4485
Epoch [1/2], Step [300/600], Loss: 0.2460
Epoch [1/2], Step [400/600], Loss: 0.1604
Epoch [1/2], Step [500/600], Loss: 0.2246
Epoch [1/2], Step [600/600], Loss: 0.1616
Epoch [2/2], Step [100/600], Loss: 0.0647
Epoch [2/2], Step [200/600], Loss: 0.1051
Epoch [2/2], Step [300/600], Loss: 0.1356
Epoch [2/2], Step [400/600], Loss: 0.0415
Epoch [2/2], Step [500/600], Loss: 0.0389
Epoch [2/2], Step [600/600], Loss: 0.0801
Test Accuracy of the model on the 10000 test images: 97.45 %

参考链接

联系我

资助我们

随机推荐

资源分享 – GPU Pro 360 – Guide to Image Space 英文PDF下载

ThreeJS – three.moudle.js报Uncaught SyntaxError：Unexpected token ‘export‘错误

书籍翻译 – Fundamentals of Computer Graphics, Fourth Edition，第3章 Raster Images中文翻译

BasicSR 模型训练和模型测试配置文件配置信息详细说明

资源分享 – An Introduction to Ray Tracing 英文PDF下载

宝塔面板 – 部署风铃发卡系统小白教程

最新评论

Pytorch – 内置的LSTM网络torch.nn.LSTM参数详解与使用示例

1 torch.nn.LSTM

1.1 创建torch.nn.LSTM

1.2 使用torch.nn.LSTM

1.3 torch.nn.LSTM使用示例

1.4 使用LSTM进行mnist数据集图像分类

发表评论点击这里取消回复。

联系我

资助我们

随机推荐

资源分享 – GPU Pro 360 – Guide to Image Space 英文PDF下载

ThreeJS – three.moudle.js报Uncaught SyntaxError：Unexpected token ‘export‘错误

书籍翻译 – Fundamentals of Computer Graphics, Fourth Edition，第3章 Raster Images中文翻译

BasicSR 模型训练和模型测试配置文件配置信息详细说明

资源分享 – An Introduction to Ray Tracing 英文PDF下载

宝塔面板 – 部署风铃发卡系统小白教程

最新评论

Pytorch – 内置的LSTM网络torch.nn.LSTM参数详解与使用示例

1 torch.nn.LSTM

1.1 创建torch.nn.LSTM

1.2 使用torch.nn.LSTM

1.3 torch.nn.LSTM使用示例

1.4 使用LSTM进行mnist数据集图像分类

发表评论 点击这里取消回复。

大家都在搜

关注我们的公众号

发表评论点击这里取消回复。