This website requires JavaScript.

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

Jie YangAiling ZengShilong LiuFeng LiRuimao ZhangLei Zhang
Feb 2023
摘要
This paper presents a novel end-to-end framework with Explicit box Detectionfor multi-person Pose estimation, called ED-Pose, where it unifies thecontextual learning between human-level (global) and keypoint-level (local)information. Different from previous one-stage methods, ED-Pose re-considersthis task as two explicit box detection processes with a unified representationand regression supervision. First, we introduce a human detection decoder fromencoded tokens to extract global features. It can provide a good initializationfor the latter keypoint detection, making the training process converge fast.Second, to bring in contextual information near keypoints, we regard poseestimation as a keypoint box detection problem to learn both box positions andcontents for each keypoint. A human-to-keypoint detection decoder adopts aninteractive learning strategy between human and keypoint features to furtherenhance global and local feature aggregation. In general, ED-Pose isconceptually simple without post-processing and dense heatmap supervision. Itdemonstrates its effectiveness and efficiency compared with both two-stage andone-stage methods. Notably, explicit box detection boosts the pose estimationperformance by 4.5 AP on COCO and 9.9 AP on CrowdPose. For the first time, as afully end-to-end framework with a L1 regression loss, ED-Pose surpassesheatmap-based Top-down methods under the same backbone by 1.2 AP on COCO andachieves the state-of-the-art with 76.6 AP on CrowdPose without bells andwhistles. Code is available at https://github.com/IDEA-Research/ED-Pose.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?