「
BlazePose: On-Machine Real-time Body Pose Tracking
」を編集中
ナビゲーションに移動
検索に移動
警告:
ログインしていません。編集を行うと、あなたの IP アドレスが公開されます。
ログイン
または
アカウントを作成
すれば、あなたの編集はその利用者名とともに表示されるほか、その他の利点もあります。
スパム攻撃防止用のチェックです。 けっして、ここには、値の入力は
しない
でください!
<br>We present BlazePose, a lightweight convolutional neural network architecture for [https://www.wakewiki.de/index.php?title=Benutzer:DanieleWillhite iTagPro key finder] human pose estimation that's tailored for real-time inference on cell units. During inference, the community produces 33 physique keypoints for a single individual and runs at over 30 frames per second on a Pixel 2 cellphone. This makes it significantly suited to actual-time use cases like fitness tracking and signal language recognition. Our foremost contributions embrace a novel body pose monitoring answer and a lightweight body pose estimation neural network that uses both heatmaps and regression to keypoint coordinates. Human physique pose estimation from images or video plays a central function in various functions reminiscent of well being monitoring, [https://morphomics.science/wiki/The_Ultimate_Guide_To_ITAGpro_Tracker:_Everything_You_Need_To_Know itagpro locator] sign language recognition, and gestural management. This activity is challenging as a consequence of a large number of poses, quite a few degrees of freedom, and occlusions. The widespread method is to supply heatmaps for each joint together with refining offsets for each coordinate. While this choice of heatmaps scales to a number of individuals with minimal overhead, it makes the mannequin for a single particular person significantly bigger than is appropriate for real-time inference on cell phones.<br><br><br><br>In this paper, we handle this specific use case and exhibit vital speedup of the mannequin with little to no quality degradation. In distinction to heatmap-based strategies, regression-primarily based approaches, [https://fakenews.win/wiki/ITagPro_Tracker:_The_Ultimate_Bluetooth_Locator_Device iTagPro portable] while much less computationally demanding and extra scalable, try to predict the mean coordinate values, often failing to deal with the underlying ambiguity. We prolong this concept in our work and use an encoder-decoder network architecture to foretell heatmaps for all joints, followed by another encoder that regresses on to the coordinates of all joints. The [http://cgi.www5b.biglobe.ne.jp/~akanbe/yu-betsu/joyful/joyful.cgi?page=20 iTagPro key finder] perception behind our work is that the heatmap department might be discarded during inference, making it sufficiently lightweight to run on a mobile phone. Our pipeline consists of a lightweight body pose detector followed by a pose tracker community. The tracker predicts keypoint coordinates, the presence of the person on the current body, and the refined region of interest for the present body. When the tracker indicates that there isn't any human present, we re-run the detector network on the next body.<br><br><br><br>The majority of modern object detection options depend on the Non-Maximum Suppression (NMS) algorithm for his or her last post-processing step. This works properly for rigid objects with few degrees of freedom. However, this algorithm breaks down for scenarios that include extremely articulated poses like these of humans, e.g. people waving or hugging. It's because a number of, ambiguous bins satisfy the intersection over union (IoU) threshold for the NMS algorithm. To overcome this limitation, we concentrate on detecting the bounding field of a comparatively inflexible physique part like the human face or torso. We observed that in lots of cases, the strongest sign to the neural community in regards to the place of the torso is the person’s face (as it has excessive-distinction options and has fewer variations in appearance). To make such a person detector fast and lightweight, [https://pattern-wiki.win/wiki/The_Ultimate_Guide_To_ITAGPRO_Tracker:_Everything_You_Need_To_Know iTagPro bluetooth tracker] we make the robust, yet for AR purposes valid, assumption that the top of the individual should at all times be seen for our single-person use case. This face detector predicts extra particular person-specific alignment parameters: the middle level between the person’s hips, the dimensions of the circle circumscribing the whole particular person, and incline (the angle between the strains connecting the 2 mid-shoulder and mid-hip points).<br><br><br><br>This permits us to be in keeping with the respective datasets and inference networks. In comparison with the majority of present pose estimation options that detect keypoints using heatmaps, our monitoring-primarily based resolution requires an preliminary pose alignment. We prohibit our dataset to those instances where both the whole person is seen, or where hips and shoulders keypoints might be confidently annotated. To make sure the mannequin helps heavy occlusions that are not present in the dataset, we use substantial occlusion-simulating augmentation. Our training dataset consists of 60K photographs with a single or few individuals within the scene in frequent poses and 25K images with a single person within the scene performing health workout routines. All of these pictures had been annotated by people. We adopt a mixed heatmap, offset, and regression method, as proven in Figure 4. We use the heatmap and offset loss solely within the coaching stage and take away the corresponding output layers from the model before working the inference.<br><br><br><br>Thus, we effectively use the heatmap to supervise the lightweight embedding, which is then utilized by the regression encoder community. This strategy is partially inspired by Stacked Hourglass method of Newell et al. We actively utilize skip-connections between all the phases of the network to realize a steadiness between excessive- and low-degree options. However, the gradients from the regression encoder are usually not propagated back to the heatmap-trained features (note the gradient-stopping connections in Figure 4). We have now found this to not only enhance the heatmap predictions, but additionally substantially improve the coordinate regression accuracy. A related pose prior is a vital a part of the proposed answer. We deliberately restrict supported ranges for the angle, scale, and translation throughout augmentation and information preparation when coaching. This enables us to decrease the community capability, making the community faster whereas requiring fewer computational and thus power assets on the host gadget. Based on both the detection stage or the earlier frame keypoints, we align the individual so that the point between the hips is located at the middle of the sq. image passed as the neural community input.<br>
編集内容の要約:
鈴木広大への投稿はすべて、他の投稿者によって編集、変更、除去される場合があります。 自分が書いたものが他の人に容赦なく編集されるのを望まない場合は、ここに投稿しないでください。
また、投稿するのは、自分で書いたものか、パブリック ドメインまたはそれに類するフリーな資料からの複製であることを約束してください(詳細は
鈴木広大:著作権
を参照)。
著作権保護されている作品は、許諾なしに投稿しないでください!
編集を中止
編集の仕方
(新しいウィンドウで開きます)
案内メニュー
個人用ツール
ログインしていません
トーク
投稿記録
アカウント作成
ログイン
名前空間
ページ
議論
日本語
表示
閲覧
編集
履歴表示
その他
検索
案内
メインページ
最近の更新
おまかせ表示
MediaWikiについてのヘルプ
ツール
リンク元
関連ページの更新状況
特別ページ
ページ情報