Mediapipe – 将Mediapipe HolisticTracking封装成动态链接库dll/so,实现在桌面应用中嵌入全身关节点识别、手势识别、抬手放手检测识别功能
1 Mediapipe HolisticTracking
Mediapipe Holistic Tracking可对身体,面部和手部的共543个关键点进行实时跟踪检测,其中包含面部468个关节点,身体33个关键点,以及每只手分别21个关键点。
所有关键点的存储顺序以及标注对照索引图可参考我的另一篇文章:https://www.stubbornhuang.com/1916/
自己封装dll的测试程序:
1.1 封装Mediapipe Holistic Tracking的目的
之前尝试对Mediapipe的手部跟踪功能HandTracking进行了动态的封装,本质是想基于HandTracking做手势识别和手臂的抬手放手检测,HandTracking做手势识别没有任何问题,但是在做手臂的抬手放手检测的过程中主要是比较手腕的关节点在视频帧高度的位置,这就导致了基于HandTracking所做的手臂的抬手放手检测不是很准确。
针对于这个问题,以及前期封装HandTracking功能的经验,如果可以封装Mediapipe的HolisticTracking功能,直接对全身的关节点进行检测,那么做手臂的抬手放手检测只需要比较手腕与手肘的y坐标就可以,这种方式无疑是最准确的。
2 Mediapipe C++ Windows下编译环境搭建
可参考我的文章:
进行Mediapipe Windows C++环境编译。
如果能跑通上述的编译流程,则说明整个编译环境搭建没有问题,可以阅读下面的步骤了。
3 Mediapipe HolisticTracking封装
dll相关代码以及编译文件以及测试程序在Github上开源:
欢迎大家star!
该工程所需要的bazel编译文件以及代码文件位于仓库dll/holistic_tracking_dll文件夹下,封装的dll的测试工程位于dll_use_example的MediapipeTest.sln工程下。
3.1 dll接口设计
dll接口设计如下:
#ifndef HOLISTIC_TRACKING_API_H
#define HOLISTIC_TRACKING_API_H
#define EXPORT
/* 定义动态链接库dll的导出 */
#include <malloc.h>
#ifdef _WIN32
#ifdef EXPORT
#define EXPORT_API __declspec(dllexport)
#else
#define EXPORT_API __declspec(dllimport)
#endif
#else
#include <stdlib.h>
#ifdef EXPORT
#define EXPORT_API __attribute__((visibility ("default")))
#else
#endif
#endif
#ifdef __cplusplus
extern "C" {
#endif
#ifndef EXPORT_API
#define EXPORT_API
#endif
/*
@brief 初始化Google Mediapipe
@param[in] model_path 需要加载的模型路径
@return 返回操作成功或者失败
0 失败
1 成功
*/
EXPORT_API int MediapipeHolisticTrackingInit(const char* model_path);
/*
@brief 检测视频帧
@param[in] image_width 视频帧宽度
@param[in] image_height 视频帧高度
@param[in] image_data 视频帧数据
@param[in] show_result_image 是否显示结果图片
@param[out] gesture_result - 手势识别结果
@return 返回操作成功或者失败
0 失败
1 成功
*/
EXPORT_API int MediapipeHolisticTrackingDetectFrameDirect(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image = false);
/*
@brief 检测摄像头
@param[in] show_image 是否显示结果图片
@return 返回操作成功或者失败
0 失败
1 成功
*/
EXPORT_API int MediapipeHolisticTrackingDetectCamera(bool show_image = false);
/*
@brief Google Mediapipe释放
@return 返回操作成功或者失败
0 失败
1 成功
*/
EXPORT_API int MediapipeHolisticTrackingRelease();
#ifdef __cplusplus
}
#endif
#endif // !HOLISTIC_TRACKING_API_H
3.2 dll调用流程
首先通过MediapipeHolisticTrackingInit接口函数传递holistic的模型文件初始化holistic graph。
然后直接调用MediapipeHolisticTrackingDetectFrameDirect接口函数传递需要检测的OpenCV视频帧数据进行检测,并通过给定的int数组指针进行检测结果返回。如果需要各个部分的关节点的识别坐标可参考项目代码增加新的检测接口返回坐标数据。
或者使用MediapipeHolisticTrackingDetectCamera接口函数直接在dll内部用OpenCV开启摄像头进行holistic跟踪检测,当然此函数仅用于测试功能。
最后,在所有视频帧处理完成之后或者该视频处理完成之后通过MediapipeHolisticTrackingRelease函数释放内存。
3.3 封装流程
3.3.1 如何确定Graph的数据输入输出流
针对holistic tracking在cpu上的的跟踪功能,可在Mediapipe的官方仓库的mediapipe\mediapipe\graphs\holistic_tracking文件夹下找到holistic_tracking_cpu.pbtxt文件,
该文件的内容如下:
# Tracks and renders pose + hands + face landmarks.
# CPU image. (ImageFrame)
input_stream: "input_video"
# CPU image with rendered results. (ImageFrame)
output_stream: "output_video"
# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
calculator: "FlowLimiterCalculator"
input_stream: "input_video"
input_stream: "FINISHED:output_video"
input_stream_info: {
tag_index: "FINISHED"
back_edge: true
}
output_stream: "throttled_input_video"
node_options: {
[type.googleapis.com/mediapipe.FlowLimiterCalculatorOptions] {
max_in_flight: 1
max_in_queue: 1
# Timeout is disabled (set to 0) as first frame processing can take more
# than 1 second.
in_flight_timeout: 0
}
}
}
node {
calculator: "HolisticLandmarkCpu"
input_stream: "IMAGE:throttled_input_video"
output_stream: "POSE_LANDMARKS:pose_landmarks"
output_stream: "POSE_ROI:pose_roi"
output_stream: "POSE_DETECTION:pose_detection"
output_stream: "FACE_LANDMARKS:face_landmarks"
output_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
output_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
}
# Gets image size.
node {
calculator: "ImagePropertiesCalculator"
input_stream: "IMAGE:throttled_input_video"
output_stream: "SIZE:image_size"
}
# Converts pose, hands and face landmarks to a render data vector.
node {
calculator: "HolisticTrackingToRenderData"
input_stream: "IMAGE_SIZE:image_size"
input_stream: "POSE_LANDMARKS:pose_landmarks"
input_stream: "POSE_ROI:pose_roi"
input_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
input_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
input_stream: "FACE_LANDMARKS:face_landmarks"
output_stream: "RENDER_DATA_VECTOR:render_data_vector"
}
# Draws annotations and overlays them on top of the input images.
node {
calculator: "AnnotationOverlayCalculator"
input_stream: "IMAGE:throttled_input_video"
input_stream: "VECTOR:render_data_vector"
output_stream: "IMAGE:output_video"
}
从holistic_tracking_cpu.pbtxt的文件中的第3-7行中发现该Graph的原始输入流为:input_video,最后输出流为:output_video。然后在该文件的第39-48行中有以下内容:
node {
calculator: "HolisticLandmarkCpu"
input_stream: "IMAGE:throttled_input_video"
output_stream: "POSE_LANDMARKS:pose_landmarks"
output_stream: "POSE_ROI:pose_roi"
output_stream: "POSE_DETECTION:pose_detection"
output_stream: "FACE_LANDMARKS:face_landmarks"
output_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
output_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
}
在这个节点中就表明了我们所需要的POSE_LANDMARKS、FACE_LANDMARKS、LEFT_HAND_LANDMARKS、RIGHT_HAND_LANDMARKS关于身体、面部以及手部关键点的位置,有了这些信息我们在初始化holistic graph的时候就可以将我们需要的输出流添加到graph中。
3.3.2 代码实现
3.3.2.1 holistic tracking关键点检测类
HolisticTrackingDetect.h
#ifndef HOLISTIC_TRACKING_DETECT_H
#define HOLISTIC_TRACKING_DETECT_H
#include <cstdlib>
#include "absl/flags/flag.h"
#include "absl/flags/parse.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/port/file_helpers.h"
#include "mediapipe/framework/port/opencv_highgui_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/opencv_video_inc.h"
#include "mediapipe/framework/port/parse_text_proto.h"
#include "mediapipe/framework/port/status.h"
#include "mediapipe/framework/formats/detection.pb.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/formats/rect.pb.h"
namespace GoogleMediapipeDetect {
class HolisticTrackingDetect
{
public:
HolisticTrackingDetect();
virtual~HolisticTrackingDetect();
public:
int InitModel(const char* model_path);
int DetectImageDirect(int image_width, int image_height, void* image_data ,int* detect_result,bool show_result_image = false);
int DetectCamera(bool show_image = false);
int Release();
private:
absl::Status Mediapipe_InitGraph(const char* model_path);
absl::Status Mediapipe_RunMPPGraph_Direct(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image = false);
absl::Status Mediapipe_RunMPPGraph_Camera(bool show_image = false);
absl::Status Mediapipe_ReleaseGraph();
private:
bool m_bIsInit;
bool m_bIsRelease;
mediapipe::CalculatorGraph m_Graph;
const char* m_Video_InputStreamName;
const char* m_Video_OutputStreamName;
const char* m_PoseLandmarks_OutputStreamName;
const char* m_LeftHandLandmarks_OutputStreamName;
const char* m_RightHandLandmarks_OutputStreamName;
const char* m_FaceLandmarks_OutputStreamName;
std::unique_ptr<mediapipe::OutputStreamPoller> m_pVideoPoller;
std::unique_ptr<mediapipe::OutputStreamPoller> m_pPoseLandmarksPoller;
std::unique_ptr<mediapipe::OutputStreamPoller> m_pLeftHandLandmarksPoller;
std::unique_ptr<mediapipe::OutputStreamPoller> m_pRightHandLandmarksPoller;
std::unique_ptr<mediapipe::OutputStreamPoller> m_pFaceLandmarksPoller;
};
}
#endif // !HOLISTIC_TRACKING_DETECT_H
HolisticTrackingDetect.cpp
#include <vector>
#include "HolisticTrackingDetect.h"
#include "GestureRecognition.h"
#include "ArmUpAndDownRecognition.h"
GoogleMediapipeDetect::HolisticTrackingDetect::HolisticTrackingDetect()
{
m_bIsInit = false;
m_bIsRelease = false;
m_Video_InputStreamName = "input_video";
m_Video_OutputStreamName = "output_video";
m_PoseLandmarks_OutputStreamName = "pose_landmarks";
m_LeftHandLandmarks_OutputStreamName = "left_hand_landmarks";
m_RightHandLandmarks_OutputStreamName = "right_hand_landmarks";
m_FaceLandmarks_OutputStreamName = "face_landmarks";
m_pVideoPoller = nullptr;
m_pPoseLandmarksPoller = nullptr;
m_pLeftHandLandmarksPoller = nullptr;
m_pRightHandLandmarksPoller = nullptr;
m_pFaceLandmarksPoller = nullptr;
}
GoogleMediapipeDetect::HolisticTrackingDetect::~HolisticTrackingDetect()
{
if (m_bIsInit && !m_bIsRelease)
{
Release();
}
}
int GoogleMediapipeDetect::HolisticTrackingDetect::InitModel(const char* model_path)
{
absl::Status run_status = Mediapipe_InitGraph(model_path);
if (!run_status.ok())
return 0;
m_bIsInit = true;
return 1;
}
int GoogleMediapipeDetect::HolisticTrackingDetect::DetectImageDirect(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image)
{
if (!m_bIsInit)
return 0;
absl::Status run_status = Mediapipe_RunMPPGraph_Direct(image_width, image_height, image_data, detect_result,show_result_image);
if (!run_status.ok()) {
return 0;
}
return 1;
}
int GoogleMediapipeDetect::HolisticTrackingDetect::DetectCamera(bool show_image)
{
if (!m_bIsInit)
return 0;
absl::Status run_status = Mediapipe_RunMPPGraph_Camera(show_image);
if (!run_status.ok()) {
return 0;
}
return 1;
}
int GoogleMediapipeDetect::HolisticTrackingDetect::Release()
{
absl::Status run_status = Mediapipe_ReleaseGraph();
if (!run_status.ok()) {
return 0;
}
m_bIsRelease = true;
return 1;
}
absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_InitGraph(const char* model_path)
{
std::string calculator_graph_config_contents;
MP_RETURN_IF_ERROR(mediapipe::file::GetContents(model_path, &calculator_graph_config_contents));
std::cout << "mediapipe::file::GetContents success" << std::endl;
mediapipe::CalculatorGraphConfig config =
mediapipe::ParseTextProtoOrDie<mediapipe::CalculatorGraphConfig>(
calculator_graph_config_contents);
MP_RETURN_IF_ERROR(m_Graph.Initialize(config));
std::cout << "m_Graph.Initialize(config) success" << std::endl;
// 1 视频输出
auto videoOutputStream = m_Graph.AddOutputStreamPoller(m_Video_OutputStreamName);
assert(videoOutputStream.ok());
m_pVideoPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(videoOutputStream.value()));
// 2 PoseLandmarks输出
mediapipe::StatusOrPoller poseLandmarks = m_Graph.AddOutputStreamPoller(m_PoseLandmarks_OutputStreamName);
assert(poseLandmarks.ok());
m_pPoseLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(poseLandmarks.value()));
// 3 LeftHandLandmarks输出
mediapipe::StatusOrPoller leftHandLandmarks = m_Graph.AddOutputStreamPoller(m_LeftHandLandmarks_OutputStreamName);
assert(leftHandLandmarks.ok());
m_pLeftHandLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(leftHandLandmarks.value()));
// 4 RightHandLandmarks输出
mediapipe::StatusOrPoller rightHandLandmarks = m_Graph.AddOutputStreamPoller(m_RightHandLandmarks_OutputStreamName);
assert(rightHandLandmarks.ok());
m_pRightHandLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(rightHandLandmarks.value()));
// 5 FaceLandmarks输出
mediapipe::StatusOrPoller faceLandmarks = m_Graph.AddOutputStreamPoller(m_FaceLandmarks_OutputStreamName);
assert(faceLandmarks.ok());
m_pFaceLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(faceLandmarks.value()));
MP_RETURN_IF_ERROR(m_Graph.StartRun({}));
std::cout << "----------------Graph StartRun Success---------------------" << std::endl;
return absl::OkStatus();
}
absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_RunMPPGraph_Direct(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image)
{
/*----- 1 构造cv::Mat对象 -----*/
cv::Mat camera_frame(cv::Size(image_width, image_height), CV_8UC3, (uchar*)image_data);
cv::cvtColor(camera_frame, camera_frame, cv::COLOR_BGR2RGB);
// 水平翻转输入图像
cv::flip(camera_frame, camera_frame, 1);
//std::cout << "cv::Mat对象构建完成" << std::endl;
/*----- 2 将OpenCV Mat转换为ImageFrame -----*/
auto input_frame = absl::make_unique<mediapipe::ImageFrame>(
mediapipe::ImageFormat::SRGB, camera_frame.cols, camera_frame.rows,
mediapipe::ImageFrame::kDefaultAlignmentBoundary);
cv::Mat input_frame_mat = mediapipe::formats::MatView(input_frame.get());
camera_frame.copyTo(input_frame_mat);
//std::cout << "将OpenCV Mat转换为ImageFrame完成" << std::endl;
/*----- 3 发送图片到图中推理 -----*/
size_t frame_timestamp_us =
(double)cv::getTickCount() / (double)cv::getTickFrequency() * 1e6;
MP_RETURN_IF_ERROR(m_Graph.AddPacketToInputStream(
m_Video_InputStreamName, mediapipe::Adopt(input_frame.release())
.At(mediapipe::Timestamp(frame_timestamp_us))));
//std::cout << "发送图片到图中推理完成" << std::endl;
/*----- 4 得到结果 -----*/
// 1 视频输出结果帧
mediapipe::Packet packet;
if (!m_pVideoPoller->Next(&packet))
{
return absl::InvalidArgumentError("no next packet");
}
if (show_result_image)
{
// 从视频输出获取mediapipe::ImageFrame结果
auto& output_frame = packet.Get<mediapipe::ImageFrame>();
// 转换mediapipe::ImageFrame为cv::Mat
cv::Mat output_frame_mat = mediapipe::formats::MatView(&output_frame);
// 显示cv::Mat结果
cv::cvtColor(output_frame_mat, output_frame_mat, cv::COLOR_RGB2BGR);
cv::Mat dst;
cv::resize(output_frame_mat, dst, cv::Size(output_frame_mat.cols, output_frame_mat.rows));
cv::imshow("MediapipeHolistic", dst);
cv::waitKey(1);
}
// 2 PoseLandmarks
mediapipe::Packet poseeLandmarksPacket;
int left_arm_result = (int)ArmUpDown::NoResult;
int right_arm_result = (int)ArmUpDown::NoResult;
if (m_pPoseLandmarksPoller->QueueSize() != 0)
{
if (m_pPoseLandmarksPoller->Next(&poseeLandmarksPacket))
{
auto& output_landmarks = poseeLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
//std::cout << "PoseLandmarks size:" << output_landmarks.landmark_size() << std::endl;
std::vector<Point2D> posePoints;
posePoints.clear();
for (int i = 0; i < output_landmarks.landmark_size(); ++i)
{
Point2D tempPoint2D;
const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
float x = landmark.x() * camera_frame.cols;
float y = landmark.y() * camera_frame.rows;
tempPoint2D.x = x;
tempPoint2D.y = y;
posePoints.emplace_back(tempPoint2D);
}
ArmUpAndDownRecognition armUpAndDownRecognition;
armUpAndDownRecognition.RecognizeProcess(posePoints,left_arm_result,right_arm_result);
//std::cout << "手臂抬手放手识别结果:" << poseDetectResult << std::endl;
}
}
detect_result[0] = left_arm_result;
detect_result[1] = right_arm_result;
// 3 LeftHandLandmarks
mediapipe::Packet leftHandLandmarksPacket;
int leftHandDetectResult = Gesture::NoGesture;
if (m_pLeftHandLandmarksPoller->QueueSize() > 0)
{
if (m_pLeftHandLandmarksPoller->Next(&leftHandLandmarksPacket))
{
auto& output_landmarks = leftHandLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
//std::cout << "LeftHandLandmarks size:" << output_landmarks.landmark_size() << std::endl;
std::vector<Point2D> singleGesturePoints;
singleGesturePoints.clear();
for (int i = 0; i < output_landmarks.landmark_size(); ++i)
{
Point2D tempPoint2D;
const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
float x = landmark.x() * camera_frame.cols;
float y = landmark.y() * camera_frame.rows;
tempPoint2D.x = x;
tempPoint2D.y = y;
singleGesturePoints.emplace_back(tempPoint2D);
}
GestureRecognition gestureRecognition;
leftHandDetectResult = gestureRecognition.RecognizeProcess(singleGesturePoints);
//std::cout << "左手手势识别结果:" << leftHandDetectResult << std::endl;
}
}
detect_result[2] = leftHandDetectResult;
// 4 RightHandLandmarks
mediapipe::Packet rightHandLandmarksPacket;
int rightHandDetectResult = Gesture::NoGesture;
if (m_pRightHandLandmarksPoller->QueueSize() > 0)
{
if (m_pRightHandLandmarksPoller->Next(&rightHandLandmarksPacket))
{
auto& output_landmarks = rightHandLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
//std::cout << "RightHandLandmarks size:" << output_landmarks.landmark_size() << std::endl;
std::vector<Point2D> singleGesturePoints;
singleGesturePoints.clear();
for (int i = 0; i < output_landmarks.landmark_size(); ++i)
{
Point2D tempPoint2D;
const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
float x = landmark.x() * camera_frame.cols;
float y = landmark.y() * camera_frame.rows;
tempPoint2D.x = x;
tempPoint2D.y = y;
singleGesturePoints.emplace_back(tempPoint2D);
}
GestureRecognition gestureRecognition;
rightHandDetectResult = gestureRecognition.RecognizeProcess(singleGesturePoints);
//std::cout << "右手手势识别结果:" << rightHandDetectResult << std::endl;
}
}
detect_result[3] = rightHandDetectResult;
// 4 FaceLandmarks
//mediapipe::Packet faceLandmarksPacket;
//if (m_pFaceLandmarksPoller->QueueSize() > 0)
//{
// if (m_pFaceLandmarksPoller->Next(&faceLandmarksPacket))
// {
// auto& output_landmarks = faceLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
// std::cout << "FaceLandmarks size:" << output_landmarks.landmark_size() << std::endl;
// for (int i = 0; i < output_landmarks.landmark_size(); ++i)
// {
// const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
// float x = landmark.x() * camera_frame.cols;
// float y = landmark.y() * camera_frame.rows;
// }
// }
//}
return absl::OkStatus();
}
absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_RunMPPGraph_Camera(bool show_image)
{
std::string cvWindowName = "MediapipeHolistic";
// 打开OpenCV摄像头
cv::VideoCapture capture(0);
if (!capture.isOpened())
{
return absl::InvalidArgumentError("cv camera is not open");
}
bool grab_frames = true;
while (grab_frames) {
// 从摄像头抓取视频帧
cv::Mat camera_frame_raw;
capture >> camera_frame_raw;
if (camera_frame_raw.empty())
break;
cv::Mat camera_frame;
cv::cvtColor(camera_frame_raw, camera_frame, cv::COLOR_BGR2RGB);
cv::flip(camera_frame, camera_frame, 1);
// 将OpenCV Mat转换为ImageFrame
auto input_frame = absl::make_unique<mediapipe::ImageFrame>(
mediapipe::ImageFormat::SRGB, camera_frame.cols, camera_frame.rows,
mediapipe::ImageFrame::kDefaultAlignmentBoundary);
cv::Mat input_frame_mat = mediapipe::formats::MatView(input_frame.get());
camera_frame.copyTo(input_frame_mat);
// 发送图片到图中推理
size_t frame_timestamp_us =
(double)cv::getTickCount() / (double)cv::getTickFrequency() * 1e6;
MP_RETURN_IF_ERROR(m_Graph.AddPacketToInputStream(
m_Video_InputStreamName, mediapipe::Adopt(input_frame.release())
.At(mediapipe::Timestamp(frame_timestamp_us))));
// 获取推理结果
mediapipe::Packet packet;
if (!m_pVideoPoller->Next(&packet)) break;
if (show_image)
{
// 从视频输出获取mediapipe::ImageFrame结果
auto& output_frame = packet.Get<mediapipe::ImageFrame>();
// 转换mediapipe::ImageFrame为cv::Mat
cv::Mat output_frame_mat = mediapipe::formats::MatView(&output_frame);
// 显示cv::Mat结果
cv::cvtColor(output_frame_mat, output_frame_mat, cv::COLOR_RGB2BGR);
cv::Mat dst;
cv::resize(output_frame_mat, dst, cv::Size(output_frame_mat.cols / 2, output_frame_mat.rows / 2));
cv::imshow(cvWindowName, dst);
cv::waitKey(1);
}
}
if (show_image)
cv::destroyWindow(cvWindowName);
return absl::OkStatus();
}
absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_ReleaseGraph()
{
MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_Video_InputStreamName));
MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_Video_OutputStreamName));
MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_PoseLandmarks_OutputStreamName));
MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_LeftHandLandmarks_OutputStreamName));
MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_RightHandLandmarks_OutputStreamName));
MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_FaceLandmarks_OutputStreamName));
return m_Graph.WaitUntilDone();
}
仔细查看上述检测类的代码,根据3.3.1节中pbtxt文件的内容,在初始化graph的Mediapipe_InitGraph函数中为graph增加了我们所需要的输入流,然后在检测函数Mediapipe_RunMPPGraph_Direct和Mediapipe_RunMPPGraph_Camera函数中对这些数据数据进行了具体的操作。
3.3.2.2 手势检测类
GestureRecognition.h
#ifndef GESTURE_RECOGNITION_H
#define GESTURE_RECOGNITION_H
#include <vector>
#include "GestureRecognition.h"
#include "TrackingDataStructure.h"
namespace GoogleMediapipeDetect {
class GestureRecognition
{
public:
GestureRecognition();
virtual~GestureRecognition();
public:
int RecognizeProcess(const std::vector<Point2D>& single_hand_joint_points);
private:
float Vector2DAngle(const Vector2D& vec1, const Vector2D& vec2);
};
}
#endif // !GESTURE_RECOGNITION_H
GestureRecognition.cpp
#include "GestureRecognition.h"
GoogleMediapipeDetect::GestureRecognition::GestureRecognition()
{
}
GoogleMediapipeDetect::GestureRecognition::~GestureRecognition()
{
}
int GoogleMediapipeDetect::GestureRecognition::RecognizeProcess(const std::vector<Point2D>& single_hand_joint_points)
{
if (single_hand_joint_points.size() != 21)
return Gesture::NoGesture;
// 大拇指角度
Vector2D thumb_vec1;
thumb_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[2].x;
thumb_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[2].y;
Vector2D thumb_vec2;
thumb_vec2.x = single_hand_joint_points[3].x - single_hand_joint_points[4].x;
thumb_vec2.y = single_hand_joint_points[3].y - single_hand_joint_points[4].y;
float thumb_angle = Vector2DAngle(thumb_vec1, thumb_vec2);
//std::cout << "thumb_angle = " << thumb_angle << std::endl;
//std::cout << "thumb.y = " << single_hand_joint_vector[0].y << std::endl;
// 食指角度
Vector2D index_vec1;
index_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[6].x;
index_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[6].y;
Vector2D index_vec2;
index_vec2.x = single_hand_joint_points[7].x - single_hand_joint_points[8].x;
index_vec2.y = single_hand_joint_points[7].y - single_hand_joint_points[8].y;
float index_angle = Vector2DAngle(index_vec1, index_vec2);
//std::cout << "index_angle = " << index_angle << std::endl;
// 中指角度
Vector2D middle_vec1;
middle_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[10].x;
middle_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[10].y;
Vector2D middle_vec2;
middle_vec2.x = single_hand_joint_points[11].x - single_hand_joint_points[12].x;
middle_vec2.y = single_hand_joint_points[11].y - single_hand_joint_points[12].y;
float middle_angle = Vector2DAngle(middle_vec1, middle_vec2);
//std::cout << "middle_angle = " << middle_angle << std::endl;
// 无名指角度
Vector2D ring_vec1;
ring_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[14].x;
ring_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[14].y;
Vector2D ring_vec2;
ring_vec2.x = single_hand_joint_points[15].x - single_hand_joint_points[16].x;
ring_vec2.y = single_hand_joint_points[15].y - single_hand_joint_points[16].y;
float ring_angle = Vector2DAngle(ring_vec1, ring_vec2);
//std::cout << "ring_angle = " << ring_angle << std::endl;
// 小拇指角度
Vector2D pink_vec1;
pink_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[18].x;
pink_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[18].y;
Vector2D pink_vec2;
pink_vec2.x = single_hand_joint_points[19].x - single_hand_joint_points[20].x;
pink_vec2.y = single_hand_joint_points[19].y - single_hand_joint_points[20].y;
float pink_angle = Vector2DAngle(pink_vec1, pink_vec2);
//std::cout << "pink_angle = " << pink_angle << std::endl;
// 根据角度判断手势
float angle_threshold = 65;
float thumb_angle_threshold = 40;
int result = -1;
if ((thumb_angle > thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
result = Gesture::Fist;
else if ((thumb_angle > 5) && (index_angle < angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
result = Gesture::One;
else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
result = Gesture::Two;
else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle > angle_threshold))
result = Gesture::Three;
else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
result = Gesture::Four;
else if ((thumb_angle < thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
result = Gesture::Five;
else if ((thumb_angle < thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle < angle_threshold))
result = Gesture::Six;
else if ((thumb_angle < thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
result = Gesture::ThumbUp;
else if ((thumb_angle > 5) && (index_angle > angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
result = Gesture::Ok;
else
result = Gesture::NoGesture;
return result;
}
float GoogleMediapipeDetect::GestureRecognition::Vector2DAngle(const Vector2D& vec1, const Vector2D& vec2)
{
double PI = 3.141592653;
float t = (vec1.x * vec2.x + vec1.y * vec2.y) / (sqrt(pow(vec1.x, 2) + pow(vec1.y, 2)) * sqrt(pow(vec2.x, 2) + pow(vec2.y, 2)));
float angle = acos(t) * (180 / PI);
return angle;
}
3.3.2.3 手臂抬起放下检测类
ArmUpAndDownRecognition.h
#ifndef ARM_UP_AND_DOWN_RECOGNITION_H
#define ARM_UP_AND_DOWN_RECOGNITION_H
#include <vector>
#include "TrackingDataStructure.h"
namespace GoogleMediapipeDetect {
class ArmUpAndDownRecognition
{
public:
ArmUpAndDownRecognition();
virtual~ArmUpAndDownRecognition();
public:
bool RecognizeProcess(const std::vector<Point2D>& pose_joint_points,int& left_arm_result,int& right_arm_result);
};
}
#endif // !ARM_UP_AND_DOWN_RECOGNITION_H
ArmUpAndDownRecognition.cpp
#include "ArmUpAndDownRecognition.h"
GoogleMediapipeDetect::ArmUpAndDownRecognition::ArmUpAndDownRecognition()
{
}
GoogleMediapipeDetect::ArmUpAndDownRecognition::~ArmUpAndDownRecognition()
{
}
bool GoogleMediapipeDetect::ArmUpAndDownRecognition::RecognizeProcess(const std::vector<Point2D>& pose_joint_points, int& left_arm_result, int& right_arm_result)
{
if (pose_joint_points.size() != 33)
return false;
Point2D left_elbow = pose_joint_points[13];
Point2D right_elbow = pose_joint_points[14];
Point2D left_wrist = pose_joint_points[15];
Point2D right_wrist = pose_joint_points[16];
// 检测左手
if (left_wrist.y > left_elbow.y)
{
left_arm_result = (int)ArmUpDown::ArmDown;
}
else if (left_wrist.y < left_elbow.y)
{
left_arm_result = (int)ArmUpDown::ArmUp;
}
else
{
left_arm_result = (int)ArmUpDown::NoResult;
}
// 检测右手
if (right_wrist.y > left_elbow.y)
{
right_arm_result = ArmUpDown::ArmDown;
}
else if (right_wrist.y < left_elbow.y)
{
right_arm_result = ArmUpDown::ArmUp;
}
else
{
right_arm_result = ArmUpDown::NoResult;
}
return true;
}
其他的项目相关类参考Github项目。
4 项目在bazel中的编译与调试
4.1 编译
4.1.1 编译文件位置与bazel编译文件
在Mediapipe仓库目录mediapipe\mediapipe\examples\desktop下新建一个新的文件夹,命名为holistic_tracking_dll。
将Github项目的dll/holistic_tracking_dll下的所有文件拷贝到mediapipe\mediapipe\examples\desktop\holistic_tracking_dll文件夹下。
其中编译该项目的bazel的编译文件如下:
# Copyright 2020 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
licenses(["notice"])
package(default_visibility = ["//mediapipe/examples:__subpackages__"])
cc_binary(
name = "holistic_tracking_cpu",
deps = [
"//mediapipe/examples/desktop:demo_run_graph_main",
"//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
],
)
cc_binary(
name = "MediapipeHolisticTracking",
srcs = ["HolisticTrackingApi.h","HolisticTrackingApi.cpp","HolisticTrackingDetect.h","HolisticTrackingDetect.cpp","GestureRecognition.h","GestureRecognition.cpp","TrackingDataStructure.h","ArmUpAndDownRecognition.h","ArmUpAndDownRecognition.cpp"],
linkshared=True,
deps = [
"//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
],
)
# Linux only
cc_binary(
name = "holistic_tracking_gpu",
deps = [
"//mediapipe/examples/desktop:demo_run_graph_main_gpu",
"//mediapipe/graphs/holistic_tracking:holistic_tracking_gpu_deps",
],
)
其中,
cc_binary(
name = "MediapipeHolisticTracking",
srcs = ["HolisticTrackingApi.h","HolisticTrackingApi.cpp","HolisticTrackingDetect.h","HolisticTrackingDetect.cpp","GestureRecognition.h","GestureRecognition.cpp","TrackingDataStructure.h","ArmUpAndDownRecognition.h","ArmUpAndDownRecognition.cpp"],
linkshared=True,
deps = [
"//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
],
)
为该项目的编译配置。
4.1.2 编译
4.1.2.1 Release模式
示例命令:
bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="D:\\Anaconda\\python.exe" mediapipe/examples/desktop/holistic_tracking_dll:MediapipeHolisticTracking --verbose_failures
4.1.2.2 Debug可调试模式
示例命令:
bazel build -c dbg --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="D:\\Anaconda\\python.exe" mediapipe/examples/desktop/holistic_tracking_dll:MediapipeHolisticTracking --verbose_failures
Debug模式下编译的dll包含了可调试符号,可直接在Visual Studio中下断点进行调试。
4.1.2.3 编译输出目录
如果编译一切顺利,将会在mediapipe\bazel-bin\mediapipe\examples\desktop\holistic_tracking_dll目录下生成MediapipeHolisticTracking.dll文件。
4.2 调试
在本节中,教大家如何进行调试此想要调试此动态库。
首先需要使用4.1.2.2的Debug模式进行编译,编译完成之后会在mediapipe\bazel-bin\mediapipe\examples\desktop\holistic_tracking_dll下生成MediapipeHolisticTracking.dll文件与MediapipeHolisticTracking.PDB文件,看到PDB文件是不是很亲切啊?
然后将MediapipeHolisticTracking.dll文件与MediapipeHolisticTracking.PDB文件拷贝到调用dll的目录下,然后在代码文件中下断点,使用调试模式即可命中断点,赶紧愉快的进行调试吧。
本文作者:StubbornHuang
版权声明:本文为站长原创文章,如果转载请注明原文链接!
原文标题:Mediapipe – 将Mediapipe HolisticTracking封装成动态链接库dll/so,实现在桌面应用中嵌入全身关节点识别、手势识别、抬手放手检测识别功能
原文链接:https://www.stubbornhuang.com/1919/
发布于:2022年01月20日 17:21:45
修改于:2023年06月26日 20:47:49
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。
作者你好,请问在Hand Tracking和Holistic Tracking这两个任务分别编译DLL库的时候,为什么Hand Tracking需要从Mediapipe的原始仓库中拷贝mediapipe\mediapipe\examples\desktop\hand_tracking文件夹下的BUILD文件到新建的目录mediapipe\mediapipe\examples\desktop\hand_tracking_test下。
而Holistic 不需要啊?
啊我懂了,您复制的是BUILD文件...完了又改了一下
下午好,对不起,我不懂中文。Hi, I really enjoyed the tutorial. It seemed very easy to follow and useful. Unfortunately, when I tried to build dll myself, only the main camera was shown without a mediapipe window. It is connected with the newer version of mediapipe, but I don't know how to change a dll to work with a newer version.
then I've tried to use your version instead, but for some reason mediapipe v0.8.6 doesn't build on my mashine. Strangely it tries to load some bazel rules repository (SHA256 (https://github.com/bazelbuild/rules_cc/archive/main.zip) = 98525a5a98b72d17ae37c01f6f70b2c5d31a8d6582e65c44410da4c92b17a669) and fails to build it.
I am only starting to learn mediapipe and your resource is the only place with good information on this tool. So, if you can help with any ideas on how to solve the first or the second issue, I would really appreciate it.
Thanks in advance!
你好。Hello, thank you very much for you think this is a useful tutorial. I hope you can look at the open source GitHub warehouse in this article, and you should be able to run the example successfully. This tutorial is based on MediaPipe V0.8.6, but I have tested on the new MediaPipe. I only need to use the latest source code and the corresponding model. This dynamic link library can still run successfully.
你好! Thanks for your answer. I’ve been trying to follow the process, you mentioned, the whole day but did not succeed. I feel like I am doing something completely wrong at some point. I will try to describe my steps and maybe it will be easier to find a mistake.
1. I downloaded the newest mediapipe git project and was able to run hello world example.
2. I downloaded your git project and was able to run it.
3. To rebuild a dll (for now without any changes) I copied holistic_tracking_dll folder with all files to the”..\mediapipe\mediapipe\examples\desktop” folder.
4. I build it using cmd line “bazel build -c dbg --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="my python.exe directory" mediapipe/examples/desktop/holistic_tracking_dll:MediapipeHolisticTracking --verbose_failures”.
5. Build succeeds and files are generated in “..\mediapipe\bazel-bin\mediapipe\examples\desktop\holistic_tracking_dll”.
6. Then, from this folder I copy MediapipeHolisticTracking.dll, which is now 28.5 MB instead of 5.6 MB in the original project. And replace the original dll in “..\GoogleMediapipePackageDll\dll_use_example\MediapipePackageDllTest\bin\MediapipeTest\x64\Debug” with the new one.
7. I replace modules in “..\GoogleMediapipePackageDll\dll_use_example\MediapipePackageDllTest\projectfile\mediapipe\modules” with the newer modules from “..\mediapipe\mediapipe\modules”.
8. This way there is a new dll and new modules.
However, when I run the solution, I see only a console window and a default camera window.
The output in VS states that I have : “Exception thrown at 0x00007FFADCBACD29 (KernelBase.dll) in MediapipeTest.exe: WinRT originate error - 0xC00D36B3 : 'the stream number provided was invalid.'”. Moreover, there are multiple similar errors: “4754 [D3DDeviceManager] Error: ........\private\impl\mft\mft-framework\D3DDeviceManager.cpp(438):ERROR: Failed to create ID3D11DeviceContext”
What am I doing wrong?
I am very sorry, that you have to help me in your free time. However, I just can’t find any other resource, which explains mediapipe.
Thanks in advance!
你好!
1. you build it using cmd line “bazel build -c dbg --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="my python.exe directory", this is debug model, so this file is so big when compile completely.
you should use “bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="my python.exe directory” in relese model;
2 In addition to replacing the file of this folder “..\GoogleMediapipePackageDll\dll_use_example\MediapipePackageDllTest\projectfile\mediapipe\modules”, you must replace the “holistic_tracking_cpu.pbtxt” and “hand_tracking_desktop_live.pbtxt” in debug dir.
你好!
Thanks for your recommendations. Unfortunately, after following both steps, running an application still results in only camera being shown. Do you have any other ideas, what can be a source of these issues? Maybe it has something to do with BUILD files or bazel version?
Kernel.dll and other errors I’ve solved by using cv::VideoCapture cap(0, cv::CAP_DSHOW); instead of cv::VideoCapture cap(0);
However, it did not fix the mediapipe.
Maybe also can be useful, that everything works without any problems until the line 314 in main.cpp, so
“mediapipeHolisticTrackingDll.m_MediapipeHolisticTrackingDetectFrameDirect(copyMat.cols, copyMat.rows, (void*)pImageData, pdetect_result,true)”
returns false, so probably the problem is in HolisticTrackingDetect.cpp in Mediapipe_RunMPPGraph_Direct function. Which means that this function can’t find packets (line 209).
if (!m_pVideoPoller->Next(&videoPacket))
{
return absl::InvalidArgumentError("no next packet");
}
Hope this information would be enough. Thanks for helping me along my way.
Thank you in advance for the help.
Best regards,
Andrei
你好
So, do you confirm that the video frame of your camera input is correct?
你好!
Well, I think so. I mean, I can see only the camera capture from main.cpp. The one which should launch inside of the dll doesn't do it. It returns "no next packet" before imshow.
if(m_pVideoPoller != nullptr)
{
mediapipe::Packet videoPacket;
if(m_pVideoPoller->QueueSize() != 0)
{
if (!m_pVideoPoller->Next(&videoPacket))
{
return absl::InvalidArgumentError("no next packet");
}
if(show_result_image)
{
// 从视频输出获取mediapipe::ImageFrame结果
auto& output_frame = videoPacket.Get<mediapipe::ImageFrame>();
// 转换mediapipe::ImageFrame为cv::Mat
cv::Mat output_frame_mat = mediapipe::formats::MatView(&output_frame);
// 显示cv::Mat结果
cv::cvtColor(output_frame_mat, output_frame_mat, cv::COLOR_RGB2BGR);
cv::imshow("MediapipeHolistic", output_frame_mat);
}
}
}
I think it can be connected with the dll being built the worng way, yet I don't know what can be wrong. My release dll for the newest mediapipe is 6692 kb, it is the same for you?
你好!
Sorry, I have not encountered such problems.Maybe you can debug. If it doesn't work, use MediaPipe V0.8.6 for testing. If MediaPipe V0.8.6 can run successfully, it is estimated that there is something wrong with your current link.
你好!
I've solved building error by using mediapipe 0.8.10.1. Thanks for your support!
你好
No thanks.