Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0-StubbornHuang Blog

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

StubbornHuang Pytorch 发布于2023-05-08 阅读 896次 0次评论 0次点赞本文共1105个字，阅读需要3分钟。

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

Pytroch在实现断点续训功能时，在保存模型文件时，需要同时保存model、optimizer、lr_scheduler的state_dict，比如

torch.save({
    'epoch': epoch,
    'model_state_dict': self.model.state_dict(),
    'optimizer_state_dict': self.optimizer.state_dict(),
    'scheduler_state_dict': self.lr_scheduler.state_dict(),
}, model_save_path)

然后在加载模型时，除了加载模型的权重之外，还需要同时加载optimizer和lr_scheduler的权重，比如

model_weights = modified_weights(check_point_state_dict['model_state_dict'])
optimizer.load_state_dict(check_point_state_dict["optimizer_state_dict"])
lr_scheduler.load_state_dict(check_point_state_dict["scheduler_state_dict"])

这个时候比较容易犯的错误是，optimizer默认是在cpu上加载权重的，而我们之后继续训练模型时都是在GPU上进行了，所以如果optimizer没有任何修改，则会出在optimizer.step()执行时出现

RuntimeError: Expected all tensors to be on the same device, but found cuda:0

其实际上就是optimizer的权重没有在GPU上，所以解决方法就是将optimizer的权重转移到GPU上，示例代码如下

optimizer.load_state_dict(check_point_state_dict["optimizer_state_dict"])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.to(self.output_device)

其中self.output_device就是项目中的GPU索引号。

修改完成之后，错误解决。

联系我

资助我们

随机推荐

资源分享 – Collision Detection in Interactive 3D Environments 英文PDF下载

词汇手语识别、连续手语识别、连续手语翻译开源数据集整理

Centos7 – nohup方式优雅的部署jar包

计算几何 – 求解两个三维向量之间的三维旋转矩阵

姿态估计 – COCO-WholeBody数据集骨骼关节keypoint标注对应

工具软件推荐 – 好用的查找exe、dll依赖项的开源软件Dependencies

最新评论

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

发表评论点击这里取消回复。

联系我

资助我们

随机推荐

资源分享 – Collision Detection in Interactive 3D Environments 英文PDF下载

词汇手语识别、连续手语识别、连续手语翻译开源数据集整理

Centos7 – nohup方式优雅的部署jar包

计算几何 – 求解两个三维向量之间的三维旋转矩阵

姿态估计 – COCO-WholeBody数据集骨骼关节keypoint标注对应

工具软件推荐 – 好用的查找exe、dll依赖项的开源软件Dependencies

最新评论

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

发表评论 点击这里取消回复。

大家都在搜

关注我们的公众号

发表评论点击这里取消回复。