HoloGarment: 360° Novel View Synthesis of In-the-Wild Garments

Johanna Karras1,2, Yingwei Li1, Yasamin Jafarian1, Ira Kemelmacher-Shlizerman1,2

1Google Research 2University of Washington

arxiv 2025

Novel view synthesis (NVS) of in-the-wild garments is a challenging task due significant occlusions, complex human poses, and cloth deformations. Prior methods rely on synthetic 3D training data consisting of mostly unoccluded and static objects, leading to poor generalization on real-world clothing. In this paper, we propose HoloGarment (Hologram-Garment), a method that takes 1-3 images or a continuous video of a person wearing a garment and generates 360° novel views of the garment in a canonical pose. Our key insight is to bridge the domain gap between real and synthetic data with a novel implicit training paradigm leveraging a combination of large-scale real video data and small-scale synthetic 3D data to optimize a shared garment embedding space. During inference, the shared embedding space further enables construction of a garment “atlas” representation by finetuning a garment embedding on a specific real-world video. The atlas captures garment-specific geometry and texture across all viewpoints, independent of body pose or motion. Extensive experiments show that HoloGarment achieves state-of-the-art performance on NVS of in-the-wild garments from images and videos. Notably, our method robustly handles challenging real-world artifacts -- such as wrinkling, pose variation, and occlusion -- while maintaining photorealism, view consistency, fine texture details, and accurate geometry.

Gallery

Single View-to-360°

  • Single Input Garment

    360° Novel Views

    Single Input Garment

    360° Novel Views

  • Single Input Garment

    360° Novel Views

    Single Input Garment

    360° Novel Views

  • Single Input Garment

    360° Novel Views

    Single Input Garment

    360° Novel Views

  • Single Input Garment

    360° Novel Views

    Single Input Garment

    360° Novel Views

Sparse Views-to-360°

  • Multi-View Input Garment

    360° Novel Views

  • Multi-View Input Garment

    360° Novel Views

  • Multi-View Input Garment

    360° Novel Views

  • Multi-View Input Garment

    360° Novel Views

Video-to-360°

  • Input Video

    360° Novel Views

  • Input Video

    360° Novel Views

  • Input Video

    360° Novel Views

Method

$\text{Implicit Training Paradigm.}$ Our key insight is an implicit training paradigm, where two or more distinct training tasks indirectly train a model to perform the target task for which ground truth data is not available. Using a combination of real image/video data and synthetic 360° spin renderings of 3D assets, our garment encoder learns a shared garment embedding space between both domains that enables garment animation (task 1) and garment novel view synthesis (task 2). In doing so, we bypass the limitations of synthetic-only 3D datasets to handle challenging real-world garment images and videos.

$\text{Image-to-360°.}$ Our implicit training approach enables us to train a robust garment embedding space on diverse, large-scale garment video data that is also compatible with the novel view synthesis task. As a result, we can accomplish the desired implicit task of real image-to-360° novel view synthesis. Given a real garment image (task 1) and static spin pose sequence in a canonical A-pose (task 2), HoloGarment generates static 360° novel views of the input garment:

$\text{Video-to-360°via Garment Atlas Finetuning.}$ (Left) HoloGarment enables video-to-NVS by finetuning a garment-specific embedding or "atlas", on a real-world video. By utilizing this atlas during inference, HoloGarment generates photorealistic 360° novel views of the garment. (Right) In single-view (top row) and multi-view conditioning (middle row), poorly chosen input views negatively affect the quality of synthesized views. Atlas finetuning on video (bottom row) eliminates the dependency on input view selection by consolidating details from all video frames to improve garment texture details and multi-view consistency.

Method Comparisons

Comparisons to SOTA

Ablation Studies

Limitations

While our method improves over existing methods, it faces several limitations. Due to the limited diversity of the synthetic 3D garment dataset, HoloGarment struggles with unusual garment shapes (e.g. assymmetry or cut-outs). Our model also exhibits some bias towards those garment categories which are more abundant in the 3D dataset, such as pants and t-shirts. See the supplementary for qualitative examples. A larger synthetic garment dataset may remedy such issues. Other future work includes speeding up atlas finetuning (currently ~30 minutes on a single TPU) and increasing resolutiong via super-resolution network.

Bibtex

@InProceedings{Karras_HoloGarment_2025,
  author={Karras, Johanna and Li, Yingwei and Jafarian, Yasamin and Kemelmacher-Shlizerman, Ira},
  title={HoloGarment: 360° Novel View Synthesis of In-the-Wild Garments},
  month = {August},
  year={2025},
}

Acknowledgements

This work was done when all authors were at Google. We are grateful for the kind support of the whole Google ARML Commerce organization.