paperJuly 2024

How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

AuthorsEmmanuel Abbe, Samy Bengio, Aryo Lotfi, Colin Sandon, Omid Saremi

Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of globality degree to capture when weak learning is efficiently achievable by regular Transformers, where the globality measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally and theoretically under additional assumptions, distributions with high globality cannot be learned efficiently. In particular, syllogisms cannot be composed on long chains. Furthermore we show that (i) an agnostic scratchpad cannot help to break the globality barrier, (ii) an educated scratchpad can help if it breaks the globality barrier at each step, (iii) a notion of ‘inductive scratchpad’ can both break the globality barrier and improve the out-of-distribution generalization.

Related readings and updates.

How Global Calibration Strengthens Multiaccuracy

July 25, 2025research area Fairness, research area Methods and Algorithmsconference FOCS

Multiaccuracy and multicalibration are multigroup fairness notions for prediction that have found numerous applications in learning and computational complexity. They can be achieved from a single learning primitive: weak agnostic learning. Here we investigate the power of multiaccuracy as a learning primitive, both with and without the additional assumption of calibration. We find that multiaccuracy in itself is rather weak, but that the…

Can Global Semantic Context Improve Neural Language Models?

September 27, 2018research area Speech and Natural Language Processing

Entering text on your iPhone, discovering news articles you might enjoy, finding out answers to questions you may have, and many other language-related tasks depend upon robust natural language processing (NLP) models. Word embeddings are a category of NLP models that mathematically map words to numerical vectors. This capability makes it fairly straightforward to find numerically similar vectors or vector clusters, then reverse the mapping to get relevant linguistic information. Such models are at the heart of familiar apps like News, search, Siri, keyboards, and Maps. In this article, we explore whether we can improve word predictions for the QuickType keyboard using global semantic context.

How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad

Related readings and updates.

How Global Calibration Strengthens Multiaccuracy

Can Global Semantic Context Improve Neural Language Models?

Discover opportunities in Machine Learning.