English-Czech Output Bias in LLMs: A Geometry-Based Case Study

Authors

DOI:

https://doi.org/10.34190/icair.5.1.4326

Keywords:

language model evaluation, multilingual text generation, response length analysis, educational AI prompts, Czech-English comparison

Abstract

The rapid integration of large language models (LLMs) into educational, professional, and public discourse has prompted increasing scrutiny of their multilingual capabilities. While English dominates as a testing and training language, understanding LLM performance in less-resourced languages—such as Czech—is critical for equitable AI deployment. This study investigates a subtle but systematic bias in LLM behaviour: the relative verbosity of their responses in Czech versus English within the domain of elementary geometry. We compiled a dataset of 48 paired mathematical prompts, posed in both Czech and English to six prominent LLMs (ChatGPT, Claude, Gemini, Mistral Large, Copilot Quick-Nuance, and Copilot Deep-Thinker), yielding 576 total responses. Each model was accessed in a controlled language-specific context to ensure fair comparison. Using surface-level metrics—word count and character count—we observed a consistent pattern: English responses were significantly longer than Czech ones across all models. Statistical analysis confirmed the robustness of these differences, with medium to large effect sizes (Cohen’s d) in both metrics. Notably, even morphologically richer Czech did not yield longer outputs in character count, contradicting initial assumptions. Beyond confirming a consistent verbosity gap, our analysis employed rigorous statistical testing, including paired t-tests and Wilcoxon signed-rank tests, as well as effect size estimation to quantify the magnitude of the disparity. We interpret these findings in the context of known architectural and training imbalances in LLM development—particularly differences in how text is segmented and processed, alongside the relative abundance of English-language data. While stylistic conventions and user context may also influence response length, our results consistently indicate that LLMs, even those marketed as multilingual, tend to produce more verbose output in English. This raises concerns about potential discrepancies in explanation quality across languages, which may have implications for fairness and pedagogical effectiveness in multilingual educational settings. The study lays the groundwork for follow-up research that will move beyond surface metrics toward semantic content analysis of mathematical reasoning across languages. Future work will assess whether English verbosity corresponds to greater mathematical depth, or if Czech responses deliver equivalent content more concisely. This line of inquiry is vital for ensuring fairness, clarity, and effectiveness in multilingual AI deployment—especially in contexts such as mathematics education, where explanation quality directly impacts learning outcomes.

Author Biographies

Michaela Tichá, Jan Evangelista Purkyně University in Ústí nad Labem

Michaela Tichá is an Assistant Professor at the Department of Mathematics, Faculty of Science, Jan Evangelista Purkyně University in Ústí nad Labem. Her research focuses on statistics and on the use of artificial intelligence in mathematics education, with an interest in innovative teaching approaches and the development of digital competencies in education.

Jiří Přibyl, Jan Evangelista Purkyně University in Ústí nad Labem

Jiří Přibyl is an Assistant Professor at the Department of Mathematics, Faculty of Science, Jan Evangelista Purkyně University in Ústí nad Labem. His research focuses on geometry and mathematics education, with a special interest in the use of origami as a didactic tool and as a means of exploring mathematical concepts through hands-on activities and visual thinking.

Magdalena Krátká, Jan Evangelista Purkyně University in Ústí nad Labem

Magdalena Krátká is an Assistant Professor at the Department of Mathematics, Faculty of Science, Jan Evangelista Purkyně University in Ústí nad Labem. Her research focuses on mathematics education and teacher training, with publications on problem-solving, factors influencing student failure, and reflective practice of teachers and pre-service teacher education.

Downloads

Published

2025-12-04