Computer Graphics
TU Braunschweig

Neural Reconstruction and Rendering of Dynamic Real-World Content from Monocular Video


Neural Reconstruction and Rendering of Dynamic Real-World Content from Monocular Video

Modern imaging systems enable the preservation of memories and experiences in a compact digital format. On top of static images, their ability to record videos at high temporal resolution additionally preserves the dynamics and motion of the captured scene. With the availability of high-quality smartphone cameras, countless videos are recorded and shared across the globe on a daily basis.

The widespread availability inherently entails an ever-growing demand for methods and tools to further enhance these dynamic videos. This thesis addresses the challenge of reconstructing and rendering dynamic representations of a scene captured in a single monocular video, enabling free spatiotemporal scene exploration and enhancing user engagement and immersion. While 3D reconstruction has been extensively studied for static image sequences and multi-view videos, the inherent absence of depth information in monocular videos (e.g., smartphone recordings) renders this problem highly ill-posed.

In this thesis, I explore three different methods to overcome the challenges associated with monocular video-based scene reconstruction. For this purpose, I leverage recent advances in machine learning and neural rendering techniques for interactive, photorealistic novel view synthesis while bypassing the ambiguities of scene motion and depth using additional data-driven priors. Every presented method employs a unique combination of neural scene representation, rendering approach and monocular depth resolution, each tailored to the specific requirements of the given task: While the first method combines deep image translation networks with human pose estimation to generate highly realistic 2D human avatars from a temporal context, the second method targets full 3D single object reconstruction from monocularized multi-view video using neural radiance fields. Finally, the third method addresses full dynamic 3D reconstruction of casual video recordings via differentiable point rasterization initialized from monocular depth estimates.

Together, the presented techniques demonstrate how monocular videos can be enhanced for immersive digital experiences, advancing the possibilities of video-based scene reconstruction.


Author(s): Moritz Kappel
Published: October 2025
Type: PhD Thesis
School: TU Braunschweig
Presented at:
Project(s): Neural Reconstruction and Rendering of Dynamic Real-World Scenes 


@phdthesis{kappel2025neural,
  title = {Neural Reconstruction and Rendering of Dynamic Real-World Content from Monocular Video},
  author = {Kappel, Moritz},
  school = {{TU} Braunschweig},
  month = {Oct},
  year = {2025}
}

Authors