Apple Workshop on Privacy-Preserving Machine Learning: Matrix Factorization DP-FTRL outperforms DP-SGD for cross-device federated learning and centralized training
AuthorsBrand McMahan (Google)
Apple Workshop on Privacy-Preserving Machine Learning: Matrix Factorization DP-FTRL outperforms DP-SGD for cross-device federated learning and centralized training
AuthorsBrand McMahan (Google)
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing
May 5, 2026research area Methods and Algorithms, research area Speech and Natural Language Processing
Serving transformer language models with high throughput requires caching Key-Values (KVs) to avoid redundant computation during autoregressive generation. The memory footprint of KV caching is significant and heavily impacts serving costs. This work proposes to lessen these memory requirements. While recent work has largely addressed KV cache reduction via compression and eviction along the temporal axis, we argue that the depth dimension offers…
PORTool: Importance-Aware Policy Optimization with Rewarded Tree for Multi-Tool-Integrated Reasoning
May 4, 2026research area Speech and Natural Language Processing, research area Tools, Platforms, Frameworks
Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents using outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate steps (or tool-use decisions) lead to success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that…