Computer Graphics
TU Braunschweig

High-Fidelity Neural Human Motion Transfer from Monocular Video

High-Fidelity Neural Human Motion Transfer from Monocular Video

Video-based human motion transfer creates video animations of humans following a source motion. Current methods show remarkable results for tightly-clad subjects. However, the lack of temporally consistent handling of plausible clothing dynamics, including fine and high-frequency details, significantly limits the attainable visual quality. We address these limitations for the first time in the literature and present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations, for several types of loose garments. In contrast to the previous techniques, we perform image generation in three subsequent stages, synthesizing human shape, structure, and appearance. Given a monocular RGB video of an actor, we train a stack of recurrent deep neural networks that generate these intermediate representations from 2D poses and their temporal derivatives. Splitting the difficult motion transfer problem into subtasks that are aware of the temporal motion context helps us to synthesize results with plausible dynamics and pose-dependent detail. It also allows artistic control of results by manipulation of individual framework stages. In the experimental results, we significantly outperform the state-of-the-art in terms of video realism. Our code and data will be made publicly available.


Code & Dataset

Download code and dataset here.

Author(s):Moritz Kappel, Vladislav Golyanik, Mohamed Elgharib, Jann-Ole Henningson, Hans-Peter Seidel, Susana Castillo, Christian Theobalt, Marcus Magnor
Published:June 2021
Type:Article in conference proceedings
Book:IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Presented at:Conference on Computer Vision and Pattern Recognition (CVPR) 2021
Note:Oral presentation
Project(s): Comprehensive Human Performance Capture from Monocular Video Footage  Neural Reconstruction and Rendering of Dynamic Real-World Scenes  Immersive Digital Reality 

  title = {High-Fidelity Neural Human Motion Transfer from Monocular Video},
  author = {Kappel, Moritz and Golyanik, Vladislav and Elgharib, Mohamed and Henningson, Jann-Ole and Seidel, Hans-Peter and Castillo, Susana  and Theobalt, Christian and Magnor, Marcus},
  booktitle = {{IEEE}/{CVF} Conference on Computer Vision and Pattern Recognition ({CVPR})},
  note = {Oral presentation},
  pages = {1541--1550},
  month = {Jun},
  year = {2021}