C++ – Yolo的letterbox图片预处理方法，缩放图片不失真-StubbornHuang Blog

C++ – Yolo的letterbox图片预处理方法，缩放图片不失真

StubbornHuang C++ 发布于2023-07-17 阅读 2,399次 0次评论 1次点赞本文共2506个字，阅读需要7分钟。

1 letter box预处理方法

在yolo家族的一系列目标检测模型或者其他与视觉相关的深度学习模型的部署过程中，一般来说，如果导出的是静态shape的模型，那么我们先要将输入的图片resize到给定的图片大小，如果使用常规的resize方法，比如将一张1920x1080的图片resize到640x640的大小，那么resize后的图片必然是无法保持纵横比从而导致图片出现失真的现象，而模型在处理这种失真的图片时会造成模型准确度下降的问题。

现在主流是使用一种letter box的图片预处理方法，该方法在resize图片到目标大小的同时，又可以不造成图片失真并且不影响模型的推理精度。

比如下面是一张1920x1080的图片

C++ – Yolo的letterbox图片预处理方法，缩放图片不失真-StubbornHuang Blog

通过letter box方法resize到640x640，resize后的图片如下

其做法非常易懂，先根据比例保持纵横比缩放图片，然后再对需要padding的维度进行padding，其C++实现代码如下

#include <iostream>
#include "opencv2/opencv.hpp"

static float LetterBoxImage(
    const cv::Mat& image,
    cv::Mat& out_image,
    const cv::Size& new_shape = cv::Size(640, 640),
    const cv::Scalar& color = cv::Scalar(114, 114, 114)
)
{
    cv::Size shape = image.size();
    float r = std::min((float)new_shape.height / (float)shape.height, (float)new_shape.width / (float)shape.width);

    int newUnpad[2]{(int)std::round((float)shape.width * r), (int)std::round((float)shape.height * r) };

    cv::Mat tmp;
    if (shape.width != newUnpad[0] || shape.height != newUnpad[1]) {
        cv::resize(image, tmp, cv::Size(newUnpad[0], newUnpad[1]));
    }
    else {
        tmp = image.clone();
    }

    float dw = new_shape.width - newUnpad[0];
    float dh = new_shape.height - newUnpad[1];

    dw /= 2.0f;
    dh /= 2.0f;

    int top = int(std::round(dh - 0.1f));
    int bottom = int(std::round(dh + 0.1f));
    int left = int(std::round(dw - 0.1f));
    int right = int(std::round(dw + 0.1f));

    cv::copyMakeBorder(tmp, out_image, top, bottom, left, right, cv::BORDER_CONSTANT, color);

    return 1.0f / r;
}

int main()
{
    std::string image_path = "C:\\Users\\HuangWang\\Desktop\\do1.mp4_20230704_102040.312.jpg";
    cv::Mat input_image = cv::imread(image_path);

    cv::Mat out_image;
    float scale = LetterBoxImage(input_image, out_image, cv::Size(640, 640), cv::Scalar(128, 128, 128));

    cv::imwrite("./out.jpg", out_image);
}

我的博文Python – 使用letter box方法缩放图片，防止图片缩放时失真中也实现了Python版本的letter box方法，有兴趣的可以看看。

在我们将输入图片通过letterbox方法进行resize后，并将resize的图片输入到模型中进行推理，预测的结果是基于resize的图片的，那么如何还原到最初的原始图片上呢？

这个也很简单，上述代码中的LetterBoxImage函数会返回一个图片的缩放因子，我们可以通过这个缩放因子计算x和y方向的offset，如下

int x_offset = (input_w * scale - image.cols) / 2;
int y_offset = (input_h * scale - image.rows) / 2;

上面的input_w和input_h为resize图片的宽和高，image.cols和image.rows为原始图片的宽和高，scale为LetterBoxImage函数返回的缩放因子。在得到了scale、x_offset和y_offset之后我们可以将resize图片上的结果还原到原始图片上，这里以yolo的检测框结果为例，

Detection& detection = result.back();
detection.box.x = detection_boxes[4 * i] * scale - x_offset;
detection.box.y = detection_boxes[4 * i + 1] * scale - y_offset;
detection.box.width = detection_boxes[4 * i + 2] * scale - x_offset - detection.box.x;
detection.box.height = detection_boxes[4 * i + 3] * scale - y_offset - detection.box.y;

就是将基于resize图片的结果先乘以缩放因子scale，然后在分别减去offset即可

参考链接

https://github.com/zhiqwang/yolort/blob/293b378fa2c7d1bc76fac75309b62a951680ac35/deployment/tensorrt/main.cpp

联系我

资助我们

随机推荐

工具推荐 – 数学公式在线编辑并实时转换为Latex/Katex/MathML

矩阵 – 行主序矩阵与列主序矩阵

VTK读取序列的Dicom医学图片，用Marchingcube进行重建，并保存为obj文件

资源分享 – Game Programming Gems 1 英文PDF下载

资源分享 – 游戏物理引擎开发, Game Physics Engine Development 中文版PDF下载

TensorRT – 使用trtexec工具转换模型、运行模型、测试网络性能

最新评论

C++ – Yolo的letterbox图片预处理方法，缩放图片不失真

1 letter box预处理方法

参考链接

发表评论点击这里取消回复。

联系我

资助我们

随机推荐

工具推荐 – 数学公式在线编辑并实时转换为Latex/Katex/MathML

矩阵 – 行主序矩阵与列主序矩阵

VTK读取序列的Dicom医学图片，用Marchingcube进行重建，并保存为obj文件

资源分享 – Game Programming Gems 1 英文PDF下载

资源分享 – 游戏物理引擎开发, Game Physics Engine Development 中文版PDF下载

TensorRT – 使用trtexec工具转换模型、运行模型、测试网络性能

最新评论

C++ – Yolo的letterbox图片预处理方法，缩放图片不失真

1 letter box预处理方法

参考链接

发表评论 点击这里取消回复。

大家都在搜

关注我们的公众号

发表评论点击这里取消回复。