网站建设目标定位,在网站上如何做天气预报栏,wordpress 文章 附件,应用宝下载文章目录 前言一、姿势估计1.1 姿态关键点1.2 旧版 solution API1.3 新版 solution API1.4 俯卧撑计数 二、手部追踪2.1 手部姿态2.2 API 使用2.3 识别手势含义 参考 前言
Mediapipe 是谷歌出品的一种开源框架#xff0c;旨在为开发者提供一种简单而强大的工具#xff0c;用… 文章目录 前言一、姿势估计1.1 姿态关键点1.2 旧版 solution API1.3 新版 solution API1.4 俯卧撑计数 二、手部追踪2.1 手部姿态2.2 API 使用2.3 识别手势含义 参考 前言
Mediapipe 是谷歌出品的一种开源框架旨在为开发者提供一种简单而强大的工具用于实现各种视觉和感知应用程序。它包括一系列预训练的机器学习模型和用于处理多媒体数据的工具可以用于姿势估计、手部追踪、人脸检测与跟踪、面部标志、对象检测、图片分割和语言检测等任务
Mediapipe 是支持跨平台的可以部署在手机端(Android, iOS), web, desktop, edge devices, IoT 等各种平台编程语言也支持C, Python, Java, Swift, Objective-C, Javascript等
在本文中我们将通过Python实现 Mediapipe 在姿势估计和手部追踪不同领域的应用
github 地址https://github.com/google/mediapipe
一、姿势估计
1.1 姿态关键点 序号部位Pose Landmark0鼻子PoseLandmark.NOSE1左眼内侧PoseLandmark.LEFT_EYE_INNER2左眼PoseLandmark.LEFT_EYE3左眼外侧PoseLandmark.LEFT_EYE_OUTER4右眼内侧PoseLandmark.RIGHT_EYE_INNER5右眼PoseLandmark.RIGHT_EYE6右眼外侧PoseLandmark.RIGHT_EYE_OUTER7左耳PoseLandmark.LEFT_EAR8右耳PoseLandmark.RIGHT_EAR9嘴巴左侧PoseLandmark.MOUTH_LEFT10嘴巴右侧PoseLandmark.MOUTH_RIGHT11左肩PoseLandmark.LEFT_SHOULDER12右肩PoseLandmark.RIGHT_SHOULDER13左肘PoseLandmark.LEFT_ELBOW14右肘PoseLandmark.RIGHT_ELBOW15左腕PoseLandmark.LEFT_WRIST16右腕PoseLandmark.RIGHT_WRIST17左小指PoseLandmark.LEFT_PINKY18右小指PoseLandmark.RIGHT_PINKY19左食指PoseLandmark.LEFT_INDEX20右食指PoseLandmark.RIGHT_INDEX21左拇指PoseLandmark.LEFT_THUMB22右拇指PoseLandmark.RIGHT_THUMB23左臀PoseLandmark.LEFT_HIP24右臀PoseLandmark.RIGHT_HIP25左膝PoseLandmark.LEFT_KNEE26右膝PoseLandmark.RIGHT_KNEE27左踝PoseLandmark.LEFT_ANKLE28右踝PoseLandmark.RIGHT_ANKLE29左脚跟PoseLandmark.LEFT_HEEL30右脚跟PoseLandmark.RIGHT_HEEL31左脚趾PoseLandmark.LEFT_FOOT_INDEX32右脚趾PoseLandmark.RIGHT_FOOT_INDEX
1.2 旧版 solution API
Mediapipe 提供 solution API 来实现快速检测 不过这种方式在2023年5月10日停止更新了不过目前还可以使用可通过 mediapose.solutions.pose.Pose 来实现配置参数如下
选项含义值范围默认值static_image_mode如果设置为 False会将输入图像视为视频流。它将尝试检测第一张图像中最突出的人并在成功检测后进一步定位姿势。在随后的图像中它只是跟踪这些标记而不调用另一个检测直到它失去跟踪从而减少计算和延迟。如果设置为 True则人员检测将运行每个输入图像非常适合处理一批静态可能不相关的图像BooleanFalsemodel_complexity模型的复杂度准确性和推理延迟通常随着模型复杂性的增加而增加{0,1,2}1smooth_landmarks如果设置为 True则solution 过滤器会在不同的输入图像中设置标记以减少抖动但如果 static_image_mode 也设置为 True则忽略该筛选器BooleanTrueenable_segmentation如果设置为 True则除了姿态标记外还会生成分割蒙版BooleanFalsesmooth_segmentation如果设置为 True则会过滤不同输入图像中的分割掩码以减少抖动。如果enable_segmentation为 false 或 static_image_mode为 True则忽略BooleanTruemin_detection_confidence人员检测模型的最小置信度值 用于将检测视为成功Float [0.0,1.0]0.5min_tracking_confidence来自姿态跟踪模型的最小置信度值 用于将姿态标记视为成功跟踪否则将在下一个输入图像上自动调用人员检测。将其设置为更高的值可以提高解决方案的可靠性但代价是延迟更高。如果static_image_mode为 True则忽略其中人员检测仅对每个图像运行。Float [0.0,1.0]0.5
import cv2
import numpy as np
import mediapipe as mpdef main():FILE_PATH data/1.pngimg cv2.imread(FILE_PATH)mp_pose mp.solutions.posepose mp_pose.Pose(static_image_modeTrue,min_detection_confidence0.5, min_tracking_confidence0.5)res pose.process(img)img_copy img.copy()if res.pose_landmarks is not None:mp_drawing mp.solutions.drawing_utils# mp_drawing.draw_landmarks(# img_copy, res.pose_landmarks, mp.solutions.pose.POSE_CONNECTIONS)mp_drawing.draw_landmarks(img_copy,res.pose_landmarks,mp_pose.POSE_CONNECTIONS, # frozenset定义了哪些关键点要连接mp_drawing.DrawingSpec(color(255, 255, 255), # 姿态关键点thickness2,circle_radius2),mp_drawing.DrawingSpec(color(174, 139, 45), # 连线颜色thickness2,circle_radius2),)cv2.imshow(MediaPipe Pose Estimation, img_copy)cv2.waitKey(0)if __name__ __main__:main()import cv2
import numpy as np
import mediapipe as mpdef video():# 读取摄像头# cap cv2.VideoCapture(0)# 读取视频cap cv2.VideoCapture(data/1.mp4)mp_pose mp.solutions.posepose mp_pose.Pose(static_image_modeFalse,min_detection_confidence0.5, min_tracking_confidence0.5)while cap.isOpened():ret, frame cap.read()if not ret:break# 摄像头# continue# 将 BGR 图像转换为 RGBrgb_frame cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# 进行姿势估计results pose.process(rgb_frame)if results.pose_landmarks is not None:# 绘制关键点和连接线mp_drawing mp.solutions.drawing_utilsmp_drawing.draw_landmarks(frame, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)# 显示结果cv2.imshow(MediaPipe Pose Estimation, frame)if cv2.waitKey(1) 0xFF ord(q):break# 释放资源cap.release()cv2.destroyAllWindows()if __name__ __main__:video()1.3 新版 solution API
旧版 API 并不能检测多个姿态新版 API 可以实现多个姿态检测
选项含义值范围默认值running_mode设置任务的运行模式有三种模式可选: IMAGE: 单一照片输入. VIDEO: 视频. LIVE_STREAM: 输入数据例如来自摄像机为实时流。在此模式下必须调用 resultListener 来设置侦听器以异步接收结果.{IMAGE, VIDEO, LIVE_STREAM}IMAGEnum_poses姿势检测器可以检测到的最大姿势数Integer 01min_pose_detection_confidence姿势检测被认为是成功的最小置信度得分Float [0.0,1.0]0.5min_pose_presence_confidence姿态检测中的姿态存在分数的最小置信度分数Float [0.0,1.0]0.5min_tracking_confidence姿势跟踪被视为成功的最小置信度分数Float [0.0,1.0]0.5output_segmentation_masks是否为检测到的姿势输出分割掩码BooleanFalseresult_callback将结果侦听器设置为在Pose Landmark处于LIVE_STREAM模式时异步接收Landmark结果。仅当运行模式设置为LIVE_STREAM时才能使用ResultListenerN/A
from mediapipe import solutions
from mediapipe.framework.formats import landmark_pb2
import cv2
import numpy as np
import mediapipe as mpmp_drawing mp.solutions.drawing_utils
mp_pose mp.solutions.posedef draw_landmarks_on_image(rgb_image, detection_result):pose_landmarks_list detection_result.pose_landmarksannotated_image np.copy(rgb_image)# Loop through the detected poses to visualize.for idx in range(len(pose_landmarks_list)):pose_landmarks pose_landmarks_list[idx]# Draw the pose landmarks.pose_landmarks_proto landmark_pb2.NormalizedLandmarkList()pose_landmarks_proto.landmark.extend([landmark_pb2.NormalizedLandmark(xlandmark.x, ylandmark.y, zlandmark.z) for landmark in pose_landmarks])solutions.drawing_utils.draw_landmarks(annotated_image,pose_landmarks_proto,solutions.pose.POSE_CONNECTIONS,solutions.drawing_styles.get_default_pose_landmarks_style())return annotated_imagedef newSolution():BaseOptions mp.tasks.BaseOptionsPoseLandmarker mp.tasks.vision.PoseLandmarkerPoseLandmarkerOptions mp.tasks.vision.PoseLandmarkerOptionsVisionRunningMode mp.tasks.vision.RunningModemodel_path data/pose_landmarker_heavy.taskoptions PoseLandmarkerOptions(base_optionsBaseOptions(model_asset_pathmodel_path),running_modeVisionRunningMode.IMAGE,num_poses10)FILE_PATH data/4.jpgimage cv2.imread(FILE_PATH)img mp.Image.create_from_file(FILE_PATH)with PoseLandmarker.create_from_options(options) as detector:res detector.detect(img)image draw_landmarks_on_image(image, res)cv2.imshow(MediaPipe Pose Estimation, image)cv2.waitKey(0)if __name__ __main__:newSolution() 1.4 俯卧撑计数
通过计算胳膊弯曲角度来判断状态并计算俯卧撑个数
import cv2
import mediapipe as mp
import numpy as npmp_drawing mp.solutions.drawing_utils
mp_pose mp.solutions.posedef calculate_angle(a, b, c):radians np.arctan2(c.y - b.y, c.x - b.x) - \np.arctan2(a.y - b.y, a.x - b.x)angle np.abs(np.degrees(radians))return angle if angle 180 else 360 - angledef angle_of_arm(landmarks, shoulder, elbow, wrist):shoulder_coord landmarks[mp_pose.PoseLandmark[shoulder].value]elbow_coord landmarks[mp_pose.PoseLandmark[elbow].value]wrist_coord landmarks[mp_pose.PoseLandmark[wrist].value]return calculate_angle(shoulder_coord, elbow_coord, wrist_coord)def count_push_up(landmarks, counter, status):left_arm_angle angle_of_arm(landmarks, LEFT_SHOULDER, LEFT_ELBOW, LEFT_WRIST)right_arm_angle angle_of_arm(landmarks, RIGHT_SHOULDER, RIGHT_ELBOW, RIGHT_WRIST)avg_arm_angle (left_arm_angle right_arm_angle) // 2if status:if avg_arm_angle 70:counter 1status Falseelse:if avg_arm_angle 160:status Truereturn counter, statusdef main():cap cv2.VideoCapture(data/test.mp4)counter 0status Falsewith mp_pose.Pose(min_detection_confidence0.7, min_tracking_confidence0.7) as pose:while cap.isOpened():success, image cap.read()if not success:print(empty camera)breakresult pose.process(image)if result.pose_landmarks:mp_drawing.draw_landmarks(image, result.pose_landmarks, mp_pose.POSE_CONNECTIONS)counter, status count_push_up(result.pose_landmarks.landmark, counter, status)cv2.putText(image, textstr(counter), org(100, 100), fontFacecv2.FONT_HERSHEY_SIMPLEX,fontScale4, color(255, 255, 255), thickness2, lineTypecv2.LINE_AA)cv2.imshow(push-up counter, image)key cv2.waitKey(1)if key ord(q):breakcap.release()if __name__ __main__:main()二、手部追踪
2.1 手部姿态 2.2 API 使用
照片
选项含义值范围默认值static_image_mode如果设置为 False会将输入图像视为视频流。它将尝试在第一个输入图像中检测手并在成功检测后进一步定位手部标志。在随后的图像中一旦检测到所有 max_num_hands 手并定位了相应的手部标志它就会简单地跟踪这些标志而不会调用其他检测直到它失去对任何手的跟踪。这减少了延迟是处理视频帧的理想选择。如果设置为 True则对每个输入图像运行手动检测非常适合处理一批静态可能不相关的图像BooleanFalsemax_num_hands要检测的最大手数Integer2model_complexity模型的复杂度准确性和推理延迟通常随着模型复杂性的增加而增加{0,1}1min_detection_confidence检测模型的最小置信度值 用于将检测视为成功Float [0.0,1.0]0.5min_tracking_confidence来自手部跟踪模型的最小置信度值 用于将手部标记视为成功跟踪否则将在下一个输入图像上自动调用检测。将其设置为更高的值可以提高解决方案的可靠性但代价是延迟更高。如果static_image_mode为 True则忽略其中手部检测仅对每个图像运行。Float [0.0,1.0]0.5
import cv2
import mediapipe as mpmp_hands mp.solutions.handsdef main():cv2.namedWindow(MediaPipe Hand, cv2.WINDOW_NORMAL)hands mp_hands.Hands(static_image_modeFalse, max_num_hands2,min_detection_confidence0.5, min_tracking_confidence0.5)img cv2.imread(data/finger/1.jpg)rgb_frame cv2.cvtColor(img, cv2.COLOR_BGR2RGB)# 进行手部追踪results hands.process(rgb_frame)if results.multi_hand_landmarks:# 绘制手部关键点和连接线for hand_landmarks in results.multi_hand_landmarks:mp_drawing mp.solutions.drawing_utilsmp_drawing.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)# 显示结果cv2.imshow(MediaPipe Hand, img)cv2.waitKey(0)if __name__ __main__:main()import cv2
import mediapipe as mpmp_hands mp.solutions.handsdef video():hands mp_hands.Hands(static_image_modeFalse, max_num_hands2,min_detection_confidence0.4, min_tracking_confidence0.4)# 读取视频cap cv2.VideoCapture(data/hand.mp4)while cap.isOpened():ret, frame cap.read()if not ret:break# 将 BGR 图像转换为 RGBrgb_frame cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)# 进行手部追踪results hands.process(rgb_frame)if results.multi_hand_landmarks:# 绘制手部关键点和连接线for hand_landmarks in results.multi_hand_landmarks:mp_drawing mp.solutions.drawing_utilsmp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)# 显示结果cv2.imshow(MediaPipe Hand Tracking, frame)if cv2.waitKey(1) 0xFF ord(q):break# 释放资源cap.release()cv2.destroyAllWindows()if __name__ __main__:video()2.3 识别手势含义
使用 KNN 对手势进行预测
import mediapipe as mp
import numpy as np
import cv2
from mediapipe.framework.formats.landmark_pb2 import NormalizedLandmarkList
from sklearn.neighbors import KNeighborsClassifiermp_drawing mp.solutions.drawing_utils
mp_drawing_styles mp.solutions.drawing_styles
mp_hands mp.solutions.hands# 压缩特征点
class Embedder(object):def __init__(self):self._landmark_names mp.solutions.hands.HandLandmarkdef __call__(self, landmarks):# modify the call func can both handle a 3-dim dataset and a single referencing result.if isinstance(landmarks, np.ndarray):if landmarks.ndim 3: # for datasetembeddings []for lmks in landmarks:embedding self.__call__(lmks)embeddings.append(embedding)return np.array(embeddings)elif landmarks.ndim 2: # for inferenceassert landmarks.shape[0] len(list(self._landmark_names)), Unexpected number of landmarks: {}.format(landmarks.shape[0])# Normalize landmarks.landmarks self._normalize_landmarks(landmarks)# Get embedding.embedding self._get_embedding(landmarks)return embeddingelse:print(ERROR: Can NOT embedding the data you provided !)else:if isinstance(landmarks, list): # for datasetembeddings []for lmks in landmarks:embedding self.__call__(lmks)embeddings.append(embedding)return np.array(embeddings)elif isinstance(landmarks, NormalizedLandmarkList): # for inference# Normalize landmarks.landmarks np.array([[lmk.x, lmk.y, lmk.z]for lmk in landmarks.landmark], dtypenp.float32)assert landmarks.shape[0] len(list(self._landmark_names)), Unexpected number of landmarks: {}.format(landmarks.shape[0])landmarks self._normalize_landmarks(landmarks)# Get embedding.embedding self._get_embedding(landmarks)return embeddingelse:print(ERROR: Can NOT embedding the data you provided !)def _get_center(self, landmarks):# MIDDLE_FINGER_MCP:9return landmarks[9]def _get_size(self, landmarks):landmarks landmarks[:, :2]max_dist np.max(np.linalg.norm(landmarks - self._get_center(landmarks), axis1))return max_dist * 2def _normalize_landmarks(self, landmarks):landmarks np.copy(landmarks)# Normalizecenter self._get_center(landmarks)size self._get_size(landmarks)landmarks (landmarks - center) / sizelandmarks * 100 # optional, but makes debugging easier.return landmarksdef _get_embedding(self, landmarks):# we can add and delete any embedding featurestest np.array([np.dot((landmarks[2]-landmarks[0]),(landmarks[3]-landmarks[4])), # thumb bentnp.dot((landmarks[5]-landmarks[0]), (landmarks[6]-landmarks[7])),np.dot((landmarks[9]-landmarks[0]), (landmarks[10]-landmarks[11])),np.dot((landmarks[13]-landmarks[0]),(landmarks[14]-landmarks[15])),np.dot((landmarks[17]-landmarks[0]), (landmarks[18]-landmarks[19]))]).flatten()return testdef init_knn(filedata/dataset_embedded.npz):npzfile np.load(file)X npzfile[X]y npzfile[y]neigh KNeighborsClassifier(n_neighbors5)neigh.fit(X, y)return neighdef hand_pose_recognition(stream_img):# For static images:stream_img cv2.cvtColor(stream_img, cv2.COLOR_BGR2RGB)embedder Embedder()neighbors init_knn()with mp_hands.Hands(static_image_modeTrue,max_num_hands2,min_detection_confidence0.5) as hands:results hands.process(stream_img)if not results.multi_hand_landmarks:return [no_gesture], stream_imgelse:annotated_image stream_img.copy()multi_landmarks results.multi_hand_landmarks# KNN inferenceembeddings embedder(multi_landmarks)hand_class neighbors.predict(embeddings)# hand_class_prob neighbors.predict_proba(embeddings)# print(hand_class_prob)for landmarks in results.multi_hand_landmarks:mp_drawing.draw_landmarks(annotated_image,landmarks,mp_hands.HAND_CONNECTIONS,mp_drawing_styles.get_default_hand_landmarks_style(),mp_drawing_styles.get_default_hand_connections_style())return hand_class, annotated_image# 手势有10种数字有8种1-10之间7和9没有还有两种是OK手势和蜘蛛侠spide手势
# eight_sign, five_sign, four_sign, ok, one_sign, six_sign, spider, ten_sign, three_sign, two_signdef image():FILE_PATH data/ok.pngimg cv2.imread(FILE_PATH)handclass, img_final hand_pose_recognition(img)cv2.putText(img_final, texthandclass[0], org(200, 50), fontFacecv2.FONT_HERSHEY_SIMPLEX,fontScale2, color(255, 255, 255), thickness2, lineTypecv2.LINE_AA)cv2.imshow(test, cv2.cvtColor(img_final, cv2.COLOR_RGB2BGR))cv2.waitKey(0)def video():cap cv2.VideoCapture(data/ok.mp4)while cap.isOpened():ret, frame cap.read()if not ret:breakhandclass, img_final hand_pose_recognition(frame)cv2.putText(img_final, texthandclass[0], org(50, 50), fontFacecv2.FONT_HERSHEY_SIMPLEX,fontScale2, color(255, 0, 0), thickness2, lineTypecv2.LINE_AA)cv2.imshow(test, cv2.cvtColor(img_final, cv2.COLOR_RGB2BGR))if cv2.waitKey(1) 0xFF ord(q):breakif __name__ __main__:video()参考
https://developers.google.cn/mediapipe/solutions/https://github.com/googlesamples/mediapipehttps://github.com/Furkan-Gulsen/Sport-With-AIhttps://github.com/Chuanfang-Neptune/DLAV-G9