CTRLorALTer: Conditional LoRAdapter for Efficient Zero-Shot Control & Altering of T2I Models

AuthorsNick Stracke, Stefan Andreas Baumann, Josh Susskind, Miguel Angel Bautista Martin, Björn Ommer

Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to consider detailed forms of conditioning reflecting style and/or structure information remains an open problem. In this paper, we present LoRAdapter, an approach that unifies both style and structure conditioning under the same formulation using a novel conditional LoRA block that enables zero-shot control. LoRAdapter is an efficient, powerful, and architecture- agnostic approach to condition text-to-image diffusion models, which enables fine-grained control conditioning during generation and outperforms recent state- of-the-art approaches.

Figure 1: LoRAdapter allows structure and style control of the image generation process of text- to-image models in a zero-shot manner. Our approach enables powerful fine-grained and efficient unified control over both structure and style conditioning using conditional LoRA blocks.

Related readings and updates.

Controlling Language and Diffusion Models by Transporting Activations

April 10, 2025research area Computer Vision, research area Methods and Algorithms, research area Speech and Natural Language Processingconference ICLR

Large generative models are becoming increasingly capable and more widely deployed to power production applications, but getting these models to produce exactly what’s desired can still be challenging. Fine-grained control over these models’ outputs is important to meet user expectations and to mitigate potential misuses, ensuring the models’ reliability and safety. To address these issues, Apple machine learning researchers have developed a new…

Conditional Generation of Synthetic Geospatial Images from Pixel-Level and Feature-Level Inputs

September 29, 2021research area Computer Visionconference BayLearn

Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a…

CTRLorALTer: Conditional LoRAdapter for Efficient Zero-Shot Control & Altering of T2I Models

Related readings and updates.

Controlling Language and Diffusion Models by Transporting Activations

Conditional Generation of Synthetic Geospatial Images from Pixel-Level and Feature-Level Inputs

Discover opportunities in Machine Learning.