Apple Workshop on Privacy-Preserving Machine Learning: Private Federated Learning (PFL) framework
AuthorsMona Chitnis (Apple), Filip Granqvist (Apple)
Apple Workshop on Privacy-Preserving Machine Learning: Private Federated Learning (PFL) framework
AuthorsMona Chitnis (Apple), Filip Granqvist (Apple)
VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models
May 22, 2026research area Computer Vision, research area Data Science and Annotationconference CVPR
Streaming vision-language models (VLMs) continuously generate responses given an instruction prompt and an online stream of input frames. This is a core mechanism for real-time visual assistants. Existing VLM frameworks predominantly assess models in offline settings. In contrast, the performance of a streaming VLM depends on additional metrics beyond pure video understanding, including proactiveness, which reflects the timeliness of the model’s…
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
May 11, 2026research area Computer Vision, research area Methods and Algorithms
Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinforcement learning (RL). However, existing captioning-RL methods and evaluation metrics often emphasize a narrow notion of caption quality, inducing…