This website requires JavaScript.

ICON: Implicit Clothed humans Obtained from Normals

Yuliang XiuJinlong YangDimitrios TzionasMichael J. Black
摘要
Current methods for learning realistic and animatable 3D clothed avatars needeither posed 3D scans or 2D images with carefully controlled user poses. Incontrast, our goal is to learn the avatar from only 2D images of people inunconstrained poses. Given a set of images, our method estimates a detailed 3Dsurface from each image and then combines these into an animatable avatar.Implicit functions are well suited to the first task, as they can capturedetails like hair or clothes. Current methods, however, are not robust tovaried human poses and often produce 3D surfaces with broken or disembodiedlimbs, missing details, or non-human shapes. The problem is that these methodsuse global feature encoders that are sensitive to global pose. To address this,we propose ICON ("Implicit Clothed humans Obtained from Normals"), which useslocal features, instead. ICON has two main modules, both of which exploit theSMPL(-X) body model. First, ICON infers detailed clothed-human normals(front/back) conditioned on the SMPL(-X) normals. Second, a visibility-awareimplicit surface regressor produces an iso-surface of a human occupancy field.Importantly, at inference time, a feedback loop alternates between refining theSMPL(-X) mesh using the inferred clothed normals and then refining the normals.Given multiple reconstructed frames of a subject in varied poses, we useSCANimate to produce an animatable avatar from them. Evaluation on the AGORAand CAPE datasets shows that ICON outperforms the state of the art inreconstruction, even with heavily limited training data. Additionally, it ismuch more robust to out-of-distribution samples, e.g., in-the-wild poses/imagesand out-of-frame cropping. ICON takes a step towards robust 3D clothed humanreconstruction from in-the-wild images. This enables creating avatars directlyfrom video with personalized and natural pose-dependent cloth deformation.
展开全部
图表提取

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

  1. Q1
    论文试图解决什么问题?
    修宇亮 论文作者 2022/01/26

    PIFu, PIFuHD, PaMIR, ARCH, ARCH++,当下市面上这些主流的,基于单张图像,使用隐式方程(implicit function),进行三维穿衣人体重建的算法,虽然在站立/时装姿势下表现不错,但对于较难的人体姿态,比如舞蹈,运动,功夫,跑酷等,泛化性和鲁棒性极差,而单纯的对训练数据进行增广,费钱费卡,提升还很有限。


    所以,能不能用很少一点数据,就训练出对人体姿态足够鲁棒的模型呢?


    另外,随着NASA, SCANimate, SNARF, MetaAvatar, Neural-GIF等一系列工作爆发,如何从动态的三维人体扫描中学出来一个可以被驱动的用神经网络表达的数字人(animatable neural avatar)渐渐成为一个研究热点,但高质量的动态人体扫描的获得,费钱费人工,导致普通用户或者没有多视角采集设备的团队很难进入这个领域。


    所以,有没有可能直接从单目图像视频中去采集高质量的三维人体模型,直接扔进这些已有的框架,然后拿到质量尚可的可驱动数字人呢?


    针对这两个问题,我们提出了ICON。

  2. Q2
    这是否是一个新的问题?
    修宇亮 论文作者 2022/01/26

    是,也不是。人体重建是个老问题了,从动态三维扫描中学一个可驱动数字人也是个老问题,但怎么把基于图像的人体重建的质量,提升到可以和动态三维扫描相媲美,从而让两类方法可以顺利嫁接,这个据我所知,没有发表的论文讨论过。

  3. Q3
    这篇文章要验证一个什么科学假设?
    修宇亮 论文作者 2022/01/26

    强模型先验(SMPL prior)和几何表达的自由(model-free representation)是可以找到一个平衡点的。

  4. Q4
    有哪些相关研究?如何归类?谁是这一课题在领域内值得关注的研究员?
  5. Q5
    论文中提到的解决方案之关键是什么?
  6. Q6
    论文中的实验是如何设计的?
  7. Q7
    用于定量评估的数据集是什么?代码有没有开源?
  8. Q8
    论文中的实验及结果有没有很好地支持需要验证的科学假设?
  9. Q9
    这篇论文到底有什么贡献?
  10. Q10
    下一步呢?有什么工作可以继续深入?
0
被引用
笔记
问答