1 为什么要进行网络模型权值初始化?
Pytorch中设计好网络结构,并搭建完成之后通常一个重要的步骤就是需要对网络模型中某些层的权值进行初始化,如下代码所示,我们搭建了一个三维卷积网络C3D,并使用私有成员函数__init_weight
对网络中的nn.Conv3d
和nn.BatchNorm3d
模块的权值进行了初始化。
import torch
import torch.nn as nn
class C3D(nn.Module):
"""
The C3D network.
"""
def __init__(self, num_classes, pretrained=False):
super(C3D, self).__init__()
self.conv1 = nn.Conv3d(3, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.pool1 = nn.MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2))
self.conv2 = nn.Conv3d(64, 128, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.pool2 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
self.conv3a = nn.Conv3d(128, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv3b = nn.Conv3d(256, 256, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.pool3 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
self.conv4a = nn.Conv3d(256, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv4b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.pool4 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2))
self.conv5a = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.conv5b = nn.Conv3d(512, 512, kernel_size=(3, 3, 3), padding=(1, 1, 1))
self.pool5 = nn.MaxPool3d(kernel_size=(2, 2, 2), stride=(2, 2, 2), padding=(0, 1, 1))
self.fc6 = nn.Linear(8192, 4096)
self.fc7 = nn.Linear(4096, 4096)
self.fc8 = nn.Linear(4096, num_classes)
self.dropout = nn.Dropout(p=0.5)
self.relu = nn.ReLU()
self.__init_weight()
if pretrained:
self.__load_pretrained_weights()
def forward(self, x):
x = self.relu(self.conv1(x))
x = self.pool1(x)
x = self.relu(self.conv2(x))
x = self.pool2(x)
x = self.relu(self.conv3a(x))
x = self.relu(self.conv3b(x))
x = self.pool3(x)
x = self.relu(self.conv4a(x))
x = self.relu(self.conv4b(x))
x = self.pool4(x)
x = self.relu(self.conv5a(x))
x = self.relu(self.conv5b(x))
x = self.pool5(x)
x = x.view(-1, 8192)
x = self.relu(self.fc6(x))
x = self.dropout(x)
x = self.relu(self.fc7(x))
x = self.dropout(x)
logits = self.fc8(x)
return logits
def __init_weight(self):
for m in self.modules():
if isinstance(m, nn.Conv3d):
torch.nn.init.kaiming_normal_(m.weight)
elif isinstance(m, nn.BatchNorm3d):
m.weight.data.fill_(1)
m.bias.data.zero_()
if __name__ == "__main__":
inputs = torch.rand(1, 3, 16, 112, 112)
net = C3D(num_classes=101, pretrained=False)
outputs = net.forward(inputs)
print(outputs.size())
那么为什么要对网络模型初始化权重呢?
- 适合的初始化权重可以加速神经网络模型的收敛;
- 不对网络模型初始化权重或者不合适的初始化权重可能会引起梯度消失或者梯度爆炸的问题,所以为了避免深度神经网络在正向或者反向传播过程中出现梯度消失或者梯度爆炸的问题,需要对神经网络初始化权重;
本文作者:StubbornHuang
版权声明:本文为站长原创文章,如果转载请注明原文链接!
原文标题:深度学习 – 为什么要初始化网络模型权重?
原文链接:https://www.stubbornhuang.com/2328/
发布于:2022年08月26日 8:54:27
修改于:2023年06月21日 18:11:46
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。
评论
52