UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

AuthorsJason Wu†**, Eldon Schoop, Alan Leung, Titus Barik, Jeffrey P. Bigham, Jeffrey Nichols

Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs. Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model. In this paper, we explore the use of automated feedback (compilers and multi-modal models) to guide LLMs to generate high-quality UI code. Our method starts with an existing LLM and iteratively produces improved models by self-generating a large synthetic dataset using an original model, applying automated tools to aggressively filter, score, and de-duplicate the data into a refined higher quality dataset. The original LLM is improved by finetuning on this refined dataset. We applied our approach to several open-source LLMs and compared the resulting performance to baseline models with both automated metrics and human preferences. Our evaluation shows the resulting models outperform all other downloadable baselines and approach the performance of larger proprietary models.

** Work done while at Apple
† Carnegie Mellon University

Related readings and updates.

Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?

July 25, 2025research area Speech and Natural Language Processing, research area Tools, Platforms, Frameworksconference ACL

Pairwise preferences over model responses are widely collected to evaluate and provide feedback to large language models (LLMs). Given two alternative model responses to the same input, a human or AI annotator selects the “better” response. Such data can provide a feedback signal in domains where traditional hard-coded metrics are difficult to obtain (e.g. quality of a chat interactions), thereby helping measure model progress or model…

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks

August 3, 2024research area Human-Computer Interaction, research area Tools, Platforms, Frameworksconference IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)

This paper was accepted at IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2024.

Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into…

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback

Related readings and updates.

Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks

Discover opportunities in Machine Learning.