Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0-StubbornHuang Blog

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

StubbornHuang Pytorch 发布于2023-05-08 阅读 919次 0次评论 0次点赞本文共1105个字，阅读需要3分钟。

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

Pytroch在实现断点续训功能时，在保存模型文件时，需要同时保存model、optimizer、lr_scheduler的state_dict，比如

torch.save({
    'epoch': epoch,
    'model_state_dict': self.model.state_dict(),
    'optimizer_state_dict': self.optimizer.state_dict(),
    'scheduler_state_dict': self.lr_scheduler.state_dict(),
}, model_save_path)

然后在加载模型时，除了加载模型的权重之外，还需要同时加载optimizer和lr_scheduler的权重，比如

model_weights = modified_weights(check_point_state_dict['model_state_dict'])
optimizer.load_state_dict(check_point_state_dict["optimizer_state_dict"])
lr_scheduler.load_state_dict(check_point_state_dict["scheduler_state_dict"])

这个时候比较容易犯的错误是，optimizer默认是在cpu上加载权重的，而我们之后继续训练模型时都是在GPU上进行了，所以如果optimizer没有任何修改，则会出在optimizer.step()执行时出现

RuntimeError: Expected all tensors to be on the same device, but found cuda:0

其实际上就是optimizer的权重没有在GPU上，所以解决方法就是将optimizer的权重转移到GPU上，示例代码如下

optimizer.load_state_dict(check_point_state_dict["optimizer_state_dict"])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.to(self.output_device)

其中self.output_device就是项目中的GPU索引号。

修改完成之后，错误解决。

联系我

资助我们

随机推荐

C++ – 左值和右值，右值引用与移动语义的概念与理解

计算几何 – 二维几何变换，二维平移、旋转、缩放、仿射变换

C++ – vector存储动态指针时正确释放内存

WordPress – 为文章增加评论才可查看相关隐藏内容的代码

资源分享 – Level of Detail for 3D Graphics 英文PDF下载

Blender – 如何安装插件

最新评论

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

发表评论点击这里取消回复。

联系我

资助我们

随机推荐

C++ – 左值和右值，右值引用与移动语义的概念与理解

计算几何 – 二维几何变换，二维平移、旋转、缩放、仿射变换

C++ – vector存储动态指针时正确释放内存

WordPress – 为文章增加评论才可查看相关隐藏内容的代码

资源分享 – Level of Detail for 3D Graphics 英文PDF下载

Blender – 如何安装插件

最新评论

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

发表评论 点击这里取消回复。

大家都在搜

关注我们的公众号

发表评论点击这里取消回复。