ImageGEM Dataset

1NYU 2Stanford
ICCV2025

*Indicates Equal Contribution Correspondence


Our proposed ImageGem dataset and its applications. The left side illustrates image and generative model retrieval. On the right, we demonstrate a novel task of generative model personalization through LoRA weights-to-weights (W2W) space construction.

Abstract

We propose a dataset to enable the study of generative models that understand fine-grained individual preferences. We posit that a key challenge hindering the development of such a generative model is the lack of in-the-wild and fine-grained user preference annotations. Our dataset features real-world interaction data from 57K different users, who collectively have built 242K customized LoRAs, written 3M text prompts, and created 5M generated images. Our dataset enables a set of applications. With aggregate-level user preferences from our dataset, we were able to train better preference alignment models. In addition, leveraging individual-level user preference, we benchmark the performance of retrieval models and a vision-language model on personalized image retrieval and generative model recommendation and highlight the space for improvements. Finally, we demonstrate that our dataset enables, for the first time, a generative model personalization paradigm by editing customized diffusion models in a latent weight space to align with individual user preferences.

Dataset Overview

Our dataset contains 4,916,134 images, with 2,895,364 unique prompts, generated with 242,118 LoRA models after applying a safety filer. We visualize our data distribution with WizMap. We use grid tiles to display keywords extracted from image prompts or model tags.


The left panel shows a UMAP embedding of 1M images sampled from the dataset, while the right panel illustrates a contour plot of LoRA model checkpoints.

Aggregate-level Preference Alignment

We sampled three topics: Cars, Dogs, Scenery, from both our ImageGEM dataset and Pick-a-pic, and trained DiffusionDPO respectively.



Qualitative DiffusionDPO results comparison of images generated with OOD prompts in three topics sampled from DiffusionDB. For each prompt, random seed and all other hyperparameters are kept the same.

Retrieval and Generative Recommendation

With rich individual preference data, we enable personalized image retrieval and generative model recommendations using a two-stage approach: collaborative filtering (CF) retrieves top-k candidate items, followed by a visual-language model (VLM) for refined ranking.



Image ranking results from different recommendation models, where VLM demonstrates superior performance.

Generative Model Personalization

We construct a LoRA weight space with PCA. To handle large LoRA weights with various ranks, we experimented several methods, including standardizing LoRAs to rank1 with PCA, select feed-forward (FF) layers or attention value (attn v) layers only. Our results show that the SVD-based strategy yields the most robust transformations.



Editing results using the SVD-based W2W space for anime to realistic and reverse. The base model outputs are shown in the first column, followed by results with increasing tuning strength. Each row uses a fixed generation seed


Building upon the ani-real transformation, we extend our approach to learn personalized editing directions within the W2W space in the human figure domain.



Each user’s visual preference is shown at the top, with generated samples below. Left images are from the unedited SDXL base model; right images are from the edited models.

BibTeX

BibTex Code Here