paperDecember 2025

IMPACT: Inflectional Morphology Probes Across Complex Typologies

AuthorsMohammed J. Saeed, Tommi Vehvilainen, Evgeny Fedoseev, Sevil Caliskan, Tatiana Vodolazova

Large Language Models (LLMs) have shown significant progress on various multilingual benchmarks and are increasingly used to generate and evaluate text in non-English languages. However, while they may produce fluent outputs, it remains unclear to what extent these models truly grasp the underlying linguistic complexity of those languages, particularly in morphology. To investigate this, we introduce IMPACT, a synthetically generated evaluation framework focused on inflectional morphology, which we publicly release, designed to evaluate LLM performance across five morphologically rich languages: Arabic, Russian, Finnish, Turkish, and Hebrew. IMPACT includes unit-test-style cases covering both shared and language-specific phenomena, from basic verb inflections (e.g., tense, number, gender) to unique features like Arabic’s reverse gender agreement and vowel harmony in Finnish and Turkish. We assess eight multilingual LLMs that, despite strong English performance, struggle with other languages and uncommon morphological patterns, especially when judging ungrammatical examples. We also show that Chain of Thought and Thinking Models can degrade performance. Our work exposes gaps in LLMs’ handling of linguistic complexity, pointing to clear room for improvement. To support further research, we publicly release the IMPACT framework.

IMPACT: Inflectional Morphology Probes Across Complex Typologies

Related readings and updates.

Do Large Language Models Have an English Accent? Evaluating and Improving the Naturalness of Multilingual LLMs

Improving How Machine Translations Handle Grammatical Gender Ambiguity

Discover opportunities in Machine Learning.