FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On

Abstract

Given a person and a garment image, virtual try-on (VTO) aims to synthesize a realistic image of the person wearing the garment while preserving their original pose and identity. Although recent VTO methods excel at visualizing garment appearance, they largely overlook a crucial aspect of the try-on experience: the accuracy of garment fit -- for example, depicting how an extra-large shirt looks on an extra-small person. A key obstacle is the absence of datasets that provide precise garment and body size information, particularly for "ill-fit” cases, where garments are significantly too large or too small. Consequently, current VTO methods default to generating well-fitted results regardless of the garment or person size.

In this paper, we take the first steps towards solving this open problem. We introduce FIT (Fit-Inclusive Try-on), a large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. We overcome the challenges of data collection via a scalable synthetic strategy: (1) We programmatically generate 3D garments using GarmentCode and drape them via physics simulation to capture realistic garment fit. (2) We employ a novel re-texturing framework to transform synthetic renderings into photorealistic images while strictly preserving geometry. (3) We introduce person identity preservation into our re-texturing model to generate paired person images (same person, different garments) for supervised training. Finally, we leverage our FIT dataset to train a baseline fit-aware virtual try-on model. Our data and results set the new state-of-the-art for fit-aware virtual try-on, as well as offer a robust benchmark for future research. We will make all data and code publicly available.

FIT Dataset

Browse the FIT Dataset

The FIT dataset consists of 1,137,282 training and 1000 test samples, each consisting of $(I_{\text{try-on}}, I_{\text{p}}, I_g, m_p, m_g)$. FIT covers 168 distinct body shapes (82 men's, 86 women's) in sizes XS-3XL, 528 body poses, as well as 158,483 unique top and garment designs. The garments come in a diverse range of fits, from loose to tight. Refer to our paper for additional dataset statistics.

Select a data sample to view the try-on image, paired-person image, layflat garment image, and measurements. 📏

Tighter

Garment Fit

Looser

Smaller

Person Size

Larger

FIT Dataset Comparison

We compare FIT to related datasets. FIT is the first large-scale dataset for virtual try-on that provides photorealistic images, ill-fit examples, precise measurement information, and ground-truth paired-person images. For scale, we report the number of training images.

Dataset	Realism	Ill-Fit	Size Info	Triplet	Scale
SV-VTO	✓	✓	✓	✓	1,524
SIZER	✓	✓	✓	✗	2,000
DeepFashion3D	✗	✗	✗	✗	2,078
ViTON-HD	✓	✗	✗	✗	11,647
LAION-Garment	✓	✗	✗	✓	60K
SewFactory	✓	✗	✓	✗	1M
GCD	✗	✗	✓	✗	115K
Ours	✓	✓	✓	✓	1.13M

Fit-Aware Virtual Try-On Demo

S Person, XL Garment

Comparisons to State-of-the-Art

Prior virtual try-on works excel at garment appearance transfer and generating aesthetically-pleasing images. However, past methods lack precise measurement conditioning and instead hallucinate garment fit. In contrast, Fit-VTO generates high-quality virtual try-on results, while also being measurement-conditioned. Below, we show how Fit-VTO excels on synthetic FIT images with measurement conditioning, as well as real-world VITON-HD images (without measurements).

FIT Dataset

XS person,
XL garment

Bust: 84cm Height: 167cm Hips: 89cm Waist: 63cm Width: 117cm Length: 51cm Sleeve: 23cm

Inputs

Any2AnyTryOn

Nano Banana Pro

COTTON

IDM-VTON

Ours

Ground Truth

XL person,
XS garment

Bust: 107cm Height: 176cm Hips: 107cm Waist: 93cm Width: 98cm Length: 47cm Sleeve: 22cm

Inputs

Any2AnyTryOn

Nano Banana Pro

COTTON

IDM-VTON

Ours

Ground Truth

XS person,
3XL garment

Bust: 86cm Height: 161cm Hips: 89cm Waist: 69cm Width: 123cm Length: 51cm Sleeve: 22cm

Inputs

Any2AnyTryOn

Nano Banana Pro

COTTON

IDM-VTON

Ours

Ground Truth

XS person,
3XL garment

Bust: 85cm Height: 163cm Hips: 90cm Waist: 63cm Width: 161cm Length: 60cm Sleeve: 28cm

Inputs

Any2AnyTryOn

Nano Banana Pro

COTTON

IDM-VTON

Ours

Ground Truth

S person,
XL garment

Bust: 96cm Height: 176cm Hips: 103cm Waist: 63cm Width: 133cm Length: 57cm Sleeve: 24cm

Inputs

Any2AnyTryOn

Nano Banana Pro

COTTON

IDM-VTON

Ours

Ground Truth

VITON-HD Dataset

Inputs

Any2Any

Nano Banana

COTTON

IDM-VTON

Ours

Ground Truth

Inputs

Any2Any

Nano Banana

COTTON

IDM-VTON

Ours

Ground Truth

Inputs

Any2Any

Nano Banana

COTTON

IDM-VTON

Ours

Ground Truth

Inputs

Any2Any

Nano Banana

COTTON

IDM-VTON

Ours

Ground Truth

Inputs

Any2Any

Nano Banana

COTTON

IDM-VTON

Ours

Ground Truth

Inputs

Any2Any

Nano Banana

COTTON

IDM-VTON

Ours

Ground Truth

Inputs

Any2Any

Nano Banana

COTTON

IDM-VTON

Ours

Ground Truth

Inputs

Any2Any

Nano Banana

COTTON

IDM-VTON

Ours

Ground Truth

Method

FIT Dataset Generation Pipeline

(a) Overall workflow: We start by simulating a 3D garment on a target body via GarmentCode to render a synthetic image $I_s$. We generate a text prompt $p$ (via VLM) and a composite normal map $I_n$ (stitching estimated normals with realistic head/feet details). These condition our re-texturing model $f_\text{texture}$ to produce the try-on image $I_{\text{try-on}}$. Finally, we use $f_\text{paired}$ to generate a paired person image $I_p$, and a VLM to synthesize a layflat garment $I_g$. (b) GarmentCode simulation: Given a sampled design template, we compute sewing patterns for a specific body size. Then, we cross-drape these patterns onto a different target body, using box-mesh realignment to prevent simulation failures, and extract ground-truth measurements. (c) Using source and target garments draped on the same body, we derive an identity map $I_\text{id}$ by masking the garment in $I_{\text{try-on}}$. Conditioned on $I_\text{id}$, a paired normal map $I_n'$, and a paired prompt $p'$, $f_\text{paired}$ generates the paired person image $I_p$.

FIT-Aware Virtual Try-On

Our architecture is a flow-based model based on Flux.1-dev MMDiT architecture and finetuned with LoRA. FiT-VTO generates a try-on image $I_{\text{try-on}}$ given a layflat garment image $I_g$, paired person image $I_p$, and person-garment measurements $m = [m_p, m_g]$. First, image inputs $I_g$ and $I_p$ are encoded into latents separately through a pre-trained VAE encoder. We replace the text embeddings in Flux.1-dev with custom measurement embeddings $m_{\text{embed}}$ computed from $m$. Person latents are channel-concatenated with the noisy target latents, while layflat latents and $m_{\text{embed}}$ are sequence-wise concatenated with $z_t$. After processing through the diffusion transformer, clean latents are decoded by the VAE decoder.

BibTeX

@article{fitvto2026,
  author    = {Karras, Johanna and Wang, Yuanhao and Li, Yingwei and Kemelmacher-Shlizerman, Ira},
  title     = {FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On},
  journal   = {SIGGRAPH},
  year      = {2026},
}

FIT: A Large-Scale Dataset for Fit-Aware
Virtual Try-On

SIGGRAPH 2026

Abstract

FIT Dataset

Browse the FIT Dataset

FIT Dataset Comparison

Garment Resizing Demo

Fit-Aware Virtual Try-On Demo

Comparisons to State-of-the-Art

FIT Dataset

VITON-HD Dataset

Method

FIT Dataset Generation Pipeline

FIT-Aware Virtual Try-On

BibTeX