This website requires JavaScript.
ICON: Implicit Clothed humans Obtained from Normals
Yuliang XiuJinlong YangDimitrios TzionasMichael J. Black
Current methods for learning realistic and animatable 3D clothed avatars needeither posed 3D scans or 2D images with carefully controlled user poses. Incontrast, our goal is to learn the avatar from only 2D images of people inunconstrained poses. Given a set of images, our method estimates a detailed 3Dsurface from each image and then combines these into an animatable avatar.Implicit functions are well suited to the first task, as they can capturedetails like hair or clothes. Current methods, however, are not robust tovaried human poses and often produce 3D surfaces with broken or disembodiedlimbs, missing details, or non-human shapes. The problem is that these methodsuse global feature encoders that are sensitive to global pose. To address this,we propose ICON ("Implicit Clothed humans Obtained from Normals"), which useslocal features, instead. ICON has two main modules, both of which exploit theSMPL(-X) body model. First, ICON infers detailed clothed-human normals(front/back) conditioned on the SMPL(-X) normals. Second, a visibility-awareimplicit surface regressor produces an iso-surface of a human occupancy field.Importantly, at inference time, a feedback loop alternates between refining theSMPL(-X) mesh using the inferred clothed normals and then refining the normals.Given multiple reconstructed frames of a subject in varied poses, we useSCANimate to produce an animatable avatar from them. Evaluation on the AGORAand CAPE datasets shows that ICON outperforms the state of the art inreconstruction, even with heavily limited training data. Additionally, it ismuch more robust to out-of-distribution samples, e.g., in-the-wild poses/imagesand out-of-frame cropping. ICON takes a step towards robust 3D clothed humanreconstruction from in-the-wild images. This enables creating avatars directlyfrom video with personalized and natural pose-dependent cloth deformation.