We propose a dataset to enable the study of generative models that understand fine-grained individual preferences. We posit that a key challenge hindering the development of such a generative model is the lack of in-the-wild and fine-grained user preference annotations. Our dataset features real-world interaction data from 57K different users, who collectively have built 242K customized LoRAs, written 3M text prompts, and created 5M generated images. Our dataset enables a set of applications. With aggregate-level user preferences from our dataset, we were able to train better preference alignment models. In addition, leveraging individual-level user preference, we benchmark the performance of retrieval models and a vision-language model on personalized image retrieval and generative model recommendation and highlight the space for improvements. Finally, we demonstrate that our dataset enables, for the first time, a generative model personalization paradigm by editing customized diffusion models in a latent weight space to align with individual user preferences.
Our dataset contains 4,916,134 images, with 2,895,364 unique prompts, generated with 242,118 LoRA models after applying a safety filer. We visualize our data distribution with WizMap. We use grid tiles to display keywords extracted from image prompts or model tags.
We sampled three topics: Cars, Dogs, Scenery, from both our ImageGEM dataset and Pick-a-pic, and trained DiffusionDPO respectively.
With rich individual preference data, we enable personalized image retrieval and generative model recommendations using a two-stage approach: collaborative filtering (CF) retrieves top-k candidate items, followed by a visual-language model (VLM) for refined ranking.
We construct a LoRA weight space with PCA. To handle large LoRA weights with various ranks, we experimented several methods, including standardizing LoRAs to rank1 with PCA, select feed-forward (FF) layers or attention value (attn v) layers only. Our results show that the SVD-based strategy yields the most robust transformations.
Building upon the ani-real transformation, we extend our approach to learn personalized editing directions within the W2W space in the human figure domain.
BibTex Code Here