Gallery
Single View-to-360°
Sparse Views-to-360°
Video-to-360°
Method
$\text{Implicit Training Paradigm.}$ Our key insight is an implicit training paradigm, where two or more distinct training tasks indirectly train a model to perform the target task for which ground truth data is not available. Using a combination of real image/video data and synthetic 360° spin renderings of 3D assets, our garment encoder learns a shared garment embedding space between both domains that enables garment animation (task 1) and garment novel view synthesis (task 2). In doing so, we bypass the limitations of synthetic-only 3D datasets to handle challenging real-world garment images and videos.

$\text{Image-to-360°.}$ Our implicit training approach enables us to train a robust garment embedding space on diverse, large-scale garment video data that is also compatible with the novel view synthesis task. As a result, we can accomplish the desired implicit task of real image-to-360° novel view synthesis. Given a real garment image (task 1) and static spin pose sequence in a canonical A-pose (task 2), HoloGarment generates static 360° novel views of the input garment:
$\text{Video-to-360°via Garment Atlas Finetuning.}$ (Left) HoloGarment enables video-to-NVS by finetuning a garment-specific embedding or "atlas", on a real-world video. By utilizing this atlas during inference, HoloGarment generates photorealistic 360° novel views of the garment. (Right) In single-view (top row) and multi-view conditioning (middle row), poorly chosen input views negatively affect the quality of synthesized views. Atlas finetuning on video (bottom row) eliminates the dependency on input view selection by consolidating details from all video frames to improve garment texture details and multi-view consistency.
Method Comparisons
Comparisons to SOTA
Ablation Studies
Limitations
While our method improves over existing methods, it faces several limitations. Due to the limited diversity of the synthetic 3D garment dataset, HoloGarment struggles with unusual garment shapes (e.g. assymmetry or cut-outs). Our model also exhibits some bias towards those garment categories which are more abundant in the 3D dataset, such as pants and t-shirts. See the supplementary for qualitative examples. A larger synthetic garment dataset may remedy such issues. Other future work includes speeding up atlas finetuning (currently ~30 minutes on a single TPU) and increasing resolutiong via super-resolution network.
Bibtex
@InProceedings{Karras_HoloGarment_2025,
author={Karras, Johanna and Li, Yingwei and Jafarian, Yasamin and Kemelmacher-Shlizerman, Ira},
title={HoloGarment: 360° Novel View Synthesis of In-the-Wild Garments},
month = {August},
year={2025},
}
Acknowledgements
This work was done when all authors were at Google. We are grateful for the kind support of the whole Google ARML Commerce organization.