Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

1 引言

在深度学习中，通过简单的堆叠网络层增加网络深度的方式并不能增加网络的性能，另外，深度网络在训练时容易引起“梯度消失”的问题（即梯度反向传播到上层，重复的乘法可能会使梯度变得非常小）。

ResNet提出了残差学习来解决退化问题。对于一个堆积层结构（几层堆积而成）。对于一个堆积层结构（几层堆积而成）当输入为 $x$ 时其学习到的特征记为 $F(x)$ , 现在再加一条分支，直接跳到堆积层的输出，则此时最终输出 $H(x) = F(x) + x$ ，如下图所示

Pytorch – 用Pytorch实现ResNet-StubbornHuang Blog

这种跳跃连接就叫做shortcut connection（类似电路中的短路）。上面这种两层结构的叫BasicBlock，一般适用于ResNet18和ResNet34，而ResNet50以后都使用下面这种三层的残差结构叫Bottleneck。

在阅读本文之前，假设你已经了解什么是图像卷积，全连接，以及熟练使用Python和Pytorch。

2 Pytorch中的卷积

2.1 torch.nn.Conv2d

详情见官方文档：https://pytorch.org/docs/1.8.0/generated/torch.nn.Conv2d.html?highlight=nn%20conv2d#torch.nn.Conv2d

1. 函数形式

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

2. 函数参数

in_channels：输入的通道数
out_channels：输出的通道数
kernel_size：卷积核的大小
stride：卷积的步长
padding：填充的padding大小

2.2 卷积的输出

在Pytorch中，对一个batch的图像进行卷积时，经常会使用以下形状为 $(b,c,h,w)$ 的Tensor，
其中

b表示batch_size
c表示图片的通道数
h表示图片的高度
w表示图片的宽度

那么经过卷积之后，一个图片的大小通过以下公式计算

\begin{array}{c} output = floor(\frac{input+2*padding-kernel}{stride} +1) \end{array}

例如，假设输入的图片大小为 $(64,64)$ ， $padding=1$ ， $kernel=3$ ， $stride=2$ ，那么通过上述公式，经过卷积之后的图片大小为 $(32,32)$ 。

3 ResNet

ResNet有ResNet18，ResNet34，ResNet50等，如下表所示。

上述图中的框选的红色部分， $3 \times 3$ 表示卷积核的大小，64表示滤波器， $\times 2$ 表示重复的数量。

ResNet主要有以下三个重要的组成部分：

input层，conv1 + max pooling，通常作为第0层
ResBlocks，第1层-第4层
average pool + 全连接层，最后一层

3.1 ResNet18

我们先从最简单的ResNet18开始。假设输入的图片的大小为 $224 \times 224$ ，输出为1000个分类。

ResNet18的网络结构如下所示，

或者

3.1.1 ResBlock

根据两幅ResNet18的网络结构图，ResBlock有两种方式，一种是使用stride=2也就是下采样进行卷积，另一种则是步长为1，我们需要在代码中分别实现，ResBlock的代码如下：

class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample):
        super().__init__()
        if downsample:
            self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=2, padding=1)
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=2),
                nn.BatchNorm2d(out_channels)
            )
        else:
            self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)
            self.shortcut = nn.Sequential()

        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = nn.ReLU()(self.bn1(self.conv1(input)))
        input = nn.ReLU()(self.bn2(self.conv2(input)))
        input = input + shortcut
        return nn.ReLU()(input)

3.1.2 输入层，第0层

输入层即第0层由 $7 \times 7$ 的卷积层和 $3 \times 3$ 的max pooling层组成，代码如下：

self.layer0 = nn.Sequential(
    nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
    nn.BatchNorm2d(64),
    nn.ReLU()
)

3.1.3 最后一层

最后一层由全局的平均池化层和全连接层组成，代码如下

self.gap = torch.nn.AdaptiveAvgPool2d(1)
self.fc = torch.nn.Linear(filters[4], outputs)

全局平均池化如下图所示：

3.1.4 ResNet18代码

将上述的代码进行组合，得到ResNet18的代码

class ResNet18(nn.Module):
    def __init__(self, in_channels, resblock, outputs=1000):
        super().__init__()
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        self.layer1 = nn.Sequential(
            resblock(64, 64, downsample=False),
            resblock(64, 64, downsample=False)
        )

        self.layer2 = nn.Sequential(
            resblock(64, 128, downsample=True),
            resblock(128, 128, downsample=False)
        )

        self.layer3 = nn.Sequential(
            resblock(128, 256, downsample=True),
            resblock(256, 256, downsample=False)
        )


        self.layer4 = nn.Sequential(
            resblock(256, 512, downsample=True),
            resblock(512, 512, downsample=False)
        )

        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.fc = torch.nn.Linear(512, outputs)

    def forward(self, input):
        input = self.layer0(input)
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        input = torch.flatten(input)
        input = self.fc(input)

        return input

我们可以使用torchsummary对上述的模型的网络结构进行统计，

from torchsummary import summary

resnet18 = ResNet18(3, ResBlock, outputs=1000)
resnet18.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet18, (3, 224, 224))

3.2 ResNet34

ResNet34与ResNet18输入层与最后一层都相同，只需要改变layer1-layer4中重复的ResBlock的数量即可，ResNet34的代码如下：

class ResNet34(nn.Module):
    def __init__(self, in_channels, resblock, outputs=1000):
        super().__init__()
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        self.layer1 = nn.Sequential(
            resblock(64, 64, downsample=False),
            resblock(64, 64, downsample=False),
            resblock(64, 64, downsample=False)
        )

        self.layer2 = nn.Sequential(
            resblock(64, 128, downsample=True),
            resblock(128, 128, downsample=False),
            resblock(128, 128, downsample=False),
            resblock(128, 128, downsample=False)
        )

        self.layer3 = nn.Sequential(
            resblock(128, 256, downsample=True),
            resblock(256, 256, downsample=False),
            resblock(256, 256, downsample=False),
            resblock(256, 256, downsample=False),
            resblock(256, 256, downsample=False),
            resblock(256, 256, downsample=False)
        )


        self.layer4 = nn.Sequential(
            resblock(256, 512, downsample=True),
            resblock(512, 512, downsample=False),
            resblock(512, 512, downsample=False),
        )

        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.fc = torch.nn.Linear(512, outputs)

    def forward(self, input):
        input = self.layer0(input)
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        input = torch.flatten(input)
        input = self.fc(input)

        return input

我们同样可以使用torchsummary统计网络信息，

resnet34 = ResNet34(3, ResBlock, outputs=1000)
resnet34.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet34, (3, 224, 224))

3.3 ResNet50，ResNet101，ResNet152

如上图所示，左边的就是我们在ResNet18和ResNet34中使用的ResBlock结构，右边为在ResNet50，ResNet101，ResNet152中使用的bottleneck结构。

3.3.1 ResBottleneckBlock

ResNet34和ResNet50之间最明显的区别是ResBlocks，如上图所示。我们需要将此组件重写为一个名为“ResBottleneckBlock”的新组件。

class ResBottleneckBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample):
        super().__init__()
        self.downsample = downsample
        self.conv1 = nn.Conv2d(in_channels, out_channels//4, kernel_size=1, stride=1)
        self.conv2 = nn.Conv2d(out_channels//4, out_channels//4, kernel_size=3, stride=2 if downsample else 1, padding=1)
        self.conv3 = nn.Conv2d(out_channels//4, out_channels, kernel_size=1, stride=1)
        self.shortcut = nn.Sequential()

        if self.downsample or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=2 if self.downsample else 1),
                nn.BatchNorm2d(out_channels)
            )

        self.bn1 = nn.BatchNorm2d(out_channels//4)
        self.bn2 = nn.BatchNorm2d(out_channels//4)
        self.bn3 = nn.BatchNorm2d(out_channels)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = nn.ReLU()(self.bn1(self.conv1(input)))
        input = nn.ReLU()(self.bn2(self.conv2(input)))
        input = nn.ReLU()(self.bn3(self.conv3(input)))
        input = input + shortcut
        return nn.ReLU()(input)

结合上述代码以及以上图片，我们可以产生3中不同的ResBottleneckBlock：

Left：ResBottleneckBlock(64, 256, downsample=False)
Middle：ResBottleneckBlock(256, 256, downsample=False)
Right：ResBottleneckBlock(256, 512, downsample=True)

3.3.2 代码整合

为了可以兼容所有的RetNet网络，在上述代码ResNet18/34的代码中增加了一个新的布尔变量useBottleneck用于指定是否使用bottleneck。如果想要生成ResNet18/34则将useBottleneck设置为False；如果想要生成ResNet-50/101/152则将useBottleneck设置为True。

最后的代码如下：

class ResNet(nn.Module):
    def __init__(self, in_channels, resblock, repeat, useBottleneck=False, outputs=1000):
        super().__init__()
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        if useBottleneck:
            filters = [64, 256, 512, 1024, 2048]
        else:
            filters = [64, 64, 128, 256, 512]

        self.layer1 = nn.Sequential()
        self.layer1.add_module('conv2_1', resblock(filters[0], filters[1], downsample=False))
        for i in range(1, repeat[0]):
                self.layer1.add_module('conv2_%d'%(i+1,), resblock(filters[1], filters[1], downsample=False))

        self.layer2 = nn.Sequential()
        self.layer2.add_module('conv3_1', resblock(filters[1], filters[2], downsample=True))
        for i in range(1, repeat[1]):
                self.layer2.add_module('conv3_%d' % (i+1,), resblock(filters[2], filters[2], downsample=False))

        self.layer3 = nn.Sequential()
        self.layer3.add_module('conv4_1', resblock(filters[2], filters[3], downsample=True))
        for i in range(1, repeat[2]):
            self.layer3.add_module('conv2_%d' % (i+1,), resblock(filters[3], filters[3], downsample=False))

        self.layer4 = nn.Sequential()
        self.layer4.add_module('conv5_1', resblock(filters[3], filters[4], downsample=True))
        for i in range(1, repeat[3]):
            self.layer4.add_module('conv3_%d'%(i+1,), resblock(filters[4], filters[4], downsample=False))

        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.fc = torch.nn.Linear(filters[4], outputs)

    def forward(self, input):
        input = self.layer0(input)
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        input = torch.flatten(input, start_dim=1)
        input = self.fc(input)

        return input

我们可以通过以下的方式构造不同的ResNet，

# resnet18
resnet18 = ResNet(3, ResBlock, [2, 2, 2, 2], useBottleneck=False, outputs=1000)
resnet18.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet18, (3, 224, 224))

# resnet34
resnet34 = ResNet(3, ResBlock, [3, 4, 6, 3], useBottleneck=False, outputs=1000)
resnet34.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet34, (3, 224, 224))

# resnet50
resnet50 = ResNet(3, ResBottleneckBlock, [3, 4, 6, 3], useBottleneck=True, outputs=1000)
resnet50.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet50, (3, 224, 224))

# resnet101
resnet101 = ResNet(3, ResBottleneckBlock, [3, 4, 23, 3], useBottleneck=True, outputs=1000)
resnet101.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet101, (3, 224, 224))

# resnet152
resnet152 = ResNet(3, ResBottleneckBlock, [3, 8, 36, 3], useBottleneck=True, outputs=1000)
resnet152.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet152, (3, 224, 224))

4 ResNet网络代码

完整的ResNet网络代码如下：

import torch
from torchsummary import summary
from torch import nn

class ResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample):
        super().__init__()
        if downsample:
            self.conv1 = nn.Conv2d(
                in_channels, out_channels, kernel_size=3, stride=2, padding=1)
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=2),
                nn.BatchNorm2d(out_channels)
            )
        else:
            self.conv1 = nn.Conv2d(
                in_channels, out_channels, kernel_size=3, stride=1, padding=1)
            self.shortcut = nn.Sequential()

        self.conv2 = nn.Conv2d(out_channels, out_channels,
                               kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = nn.ReLU()(self.bn1(self.conv1(input)))
        input = nn.ReLU()(self.bn2(self.conv2(input)))
        input = input + shortcut
        return nn.ReLU()(input)

class ResBottleneckBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample):
        super().__init__()
        self.downsample = downsample
        self.conv1 = nn.Conv2d(in_channels, out_channels//4,
                               kernel_size=1, stride=1)
        self.conv2 = nn.Conv2d(
            out_channels//4, out_channels//4, kernel_size=3, stride=2 if downsample else 1, padding=1)
        self.conv3 = nn.Conv2d(out_channels//4, out_channels, kernel_size=1, stride=1)

        if self.downsample or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1,
                          stride=2 if self.downsample else 1),
                nn.BatchNorm2d(out_channels)
            )
        else:
            self.shortcut = nn.Sequential()

        self.bn1 = nn.BatchNorm2d(out_channels//4)
        self.bn2 = nn.BatchNorm2d(out_channels//4)
        self.bn3 = nn.BatchNorm2d(out_channels)

    def forward(self, input):
        shortcut = self.shortcut(input)
        input = nn.ReLU()(self.bn1(self.conv1(input)))
        input = nn.ReLU()(self.bn2(self.conv2(input)))
        input = nn.ReLU()(self.bn3(self.conv3(input)))
        input = input + shortcut
        return nn.ReLU()(input)

class ResNet(nn.Module):
    def __init__(self, in_channels, resblock, repeat, useBottleneck=False, outputs=1000):
        super().__init__()
        self.layer0 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )

        if useBottleneck:
            filters = [64, 256, 512, 1024, 2048]
        else:
            filters = [64, 64, 128, 256, 512]

        self.layer1 = nn.Sequential()
        self.layer1.add_module('conv2_1', resblock(filters[0], filters[1], downsample=False))
        for i in range(1, repeat[0]):
                self.layer1.add_module('conv2_%d'%(i+1,), resblock(filters[1], filters[1], downsample=False))

        self.layer2 = nn.Sequential()
        self.layer2.add_module('conv3_1', resblock(filters[1], filters[2], downsample=True))
        for i in range(1, repeat[1]):
                self.layer2.add_module('conv3_%d' % (
                    i+1,), resblock(filters[2], filters[2], downsample=False))

        self.layer3 = nn.Sequential()
        self.layer3.add_module('conv4_1', resblock(filters[2], filters[3], downsample=True))
        for i in range(1, repeat[2]):
            self.layer3.add_module('conv2_%d' % (
                i+1,), resblock(filters[3], filters[3], downsample=False))

        self.layer4 = nn.Sequential()
        self.layer4.add_module('conv5_1', resblock(filters[3], filters[4], downsample=True))
        for i in range(1, repeat[3]):
            self.layer4.add_module('conv3_%d'%(i+1,),resblock(filters[4], filters[4], downsample=False))

        self.gap = torch.nn.AdaptiveAvgPool2d(1)
        self.fc = torch.nn.Linear(filters[4], outputs)

    def forward(self, input):
        input = self.layer0(input)
        input = self.layer1(input)
        input = self.layer2(input)
        input = self.layer3(input)
        input = self.layer4(input)
        input = self.gap(input)
        # torch.flatten()
        # https://stackoverflow.com/questions/60115633/pytorch-flatten-doesnt-maintain-batch-size
        input = torch.flatten(input, start_dim=1)
        input = self.fc(input)

        return input

1. ResNet18

resnet18 = ResNet(3, ResBlock, [2, 2, 2, 2], useBottleneck=False, outputs=1000)
resnet18.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet18, (3, 224, 224))

2. ResNet34

resnet34 = ResNet(3, ResBlock, [3, 4, 6, 3], useBottleneck=False, outputs=1000)
resnet34.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet34, (3, 224, 224))

3. ResNet50

resnet50 = ResNet(3, ResBottleneckBlock, [
                  3, 4, 6, 3], useBottleneck=True, outputs=1000)
resnet50.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet50, (3, 224, 224))

4. ResNet101

resnet101 = ResNet(3, ResBottleneckBlock, [
                   3, 4, 23, 3], useBottleneck=True, outputs=1000)
resnet101.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet101, (3, 224, 224))

5. ResNet152

resnet152 = ResNet(3, ResBottleneckBlock, [
                   3, 8, 36, 3], useBottleneck=True, outputs=1000)
resnet152.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
summary(resnet152, (3, 224, 224))

也可以参考：https://github.com/weiaicunzai/pytorch-cifar100/blob/master/models/resnet.py

联系我

资助我们

随机推荐

资源分享 – C++ Primer , 第5版中文版PDF下载

资源分享 – 深度学习之PyTorch实战计算机视觉 PDF下载

书籍翻译 – Fundamentals of Computer Graphics, Fourth Edition，第3章 Raster Images中文翻译

资源分享 – 图形着色器 – 理论与实践，Graphics Shaders – Theory and Practice (Second Edition) 中文版PDF下载

ThreeJS – 获取当前使用的three.js的版本

深度学习 – 经典的卷积神经网络（CNN）模型结构

最新评论

Pytorch – 用Pytorch实现ResNet

1 引言

2 Pytorch中的卷积

2.1 torch.nn.Conv2d

2.2 卷积的输出

3 ResNet

3.1 ResNet18

3.1.1 ResBlock

3.1.2 输入层，第0层

3.1.3 最后一层

3.1.4 ResNet18代码

3.2 ResNet34

3.3 ResNet50，ResNet101，ResNet152

3.3.1 ResBottleneckBlock

3.3.2 代码整合

4 ResNet网络代码

参考链接

发表评论点击这里取消回复。

联系我

资助我们

随机推荐

资源分享 – C++ Primer , 第5版 中文版PDF下载

资源分享 – 深度学习之PyTorch实战计算机视觉 PDF下载

书籍翻译 – Fundamentals of Computer Graphics, Fourth Edition，第3章 Raster Images中文翻译

资源分享 – 图形着色器 – 理论与实践，Graphics Shaders – Theory and Practice (Second Edition) 中文版PDF下载

ThreeJS – 获取当前使用的three.js的版本

深度学习 – 经典的卷积神经网络（CNN）模型结构

最新评论

Pytorch – 用Pytorch实现ResNet

1 引言

2 Pytorch中的卷积

2.1 torch.nn.Conv2d

2.2 卷积的输出

3 ResNet

3.1 ResNet18

3.1.1 ResBlock

3.1.2 输入层，第0层

3.1.3 最后一层

3.1.4 ResNet18代码

3.2 ResNet34

3.3 ResNet50，ResNet101，ResNet152

3.3.1 ResBottleneckBlock

3.3.2 代码整合

4 ResNet网络代码

参考链接

发表评论 点击这里取消回复。

大家都在搜

关注我们的公众号

资源分享 – C++ Primer , 第5版中文版PDF下载

发表评论点击这里取消回复。