Investigating minimum text lengths for lexical diversity indices

Fred Zenker, Kristopher Kyle

Research output: Contribution to journalArticlepeer-review

30 Citations (Scopus)


Lexical diversity (LD) is an important feature of a second language (L2) writer's lexical knowledge, and indices of LD have been widely used in the field of writing assessment (e.g., Cumming et al., 2006; Engber, 1995). Research with longer native speaker (L1) texts has indicated, however, that many commonly used LD indices are sensitive to text length and may conflate lexical breadth and fluency. Because of the importance of measuring LD in L2 writing assessment research, it is essential to know the degree to which particular LD indices are resistant to text length effects and the minimum text lengths at which these indices produce stable values. In this study, we investigate text length effects for nine indices of LD in a corpus of 4542 L2 argumentative essays. The results indicate that MATTR (Covington & McFall, 2010) and two versions of MTLD (McCarthy, 2005; McCarthy & Jarvis, 2010) are the most stable of the indices included in the study. MATTR performs particularly well, maintaining a high degree of stability across all text lengths. Comparisons based on essay prompt and proficiency level are also discussed.

Original languageEnglish
Article number100505
JournalAssessing Writing
Publication statusPublished - 2021 Jan

Bibliographical note

Publisher Copyright:
© 2020 Elsevier Inc.

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Education
  • Linguistics and Language


Dive into the research topics of 'Investigating minimum text lengths for lexical diversity indices'. Together they form a unique fingerprint.

Cite this