NLU Workshop Talk: Towards Practical Use of Large Pre-Trained Language Models: Addressing Errors and Inconsistencies
AuthorsChris Manning (Stanford University)
NLU Workshop Talk: Towards Practical Use of Large Pre-Trained Language Models: Addressing Errors and Inconsistencies
AuthorsChris Manning (Stanford University)
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
May 11, 2026research area Computer Vision, research area Methods and Algorithms
Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinforcement learning (RL). However, existing captioning-RL methods and evaluation metrics often emphasize a narrow notion of caption quality, inducing…
Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
May 8, 2026research area Computer Vision
We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians…