Features of lexical richness in children's books: Comparisons with child-directed speech

  • Nicola Dawson (University of Oxford)
  • Yaling Hsiao (University of Oxford)
  • Alvin Wei Ming Tan (University of Oxford)
  • Nilanjana Banerji (Oxford University Press)
  • Kate Nation (University of Oxford)


Access to children’s books via shared reading may be a particularly rich source of linguistic input in the early years. To understand how exposure to book language supports children’s learning, it is important to identify how book language differs to everyday conversation. We created a picture book corpus from 160 texts commonly read to children aged 0-5 years (around 320,000 words). We first quantified how the language of children’s books differs from child-directed speech (compiled from 10 corpora in the CHILDES UK database, around 3.8 million words) on measures of lexical richness (diversity, density, sophistication), part of speech distributions, and structural properties. We also identified the words occurring in children’s books that are most uniquely representative of book language. We found that children’s book language is lexically denser, more lexically diverse, and comprises a larger proportion of rarer word types compared to child-directed speech. Nouns and adjectives are more common in book language whereas pronouns are more common in child-directed speech. Book words are more structurally complex in relation to both number of phonemes and morphological structure. They are also later acquired, more abstract, and more emotionally arousing than the words more common in child-directed speech. Written language provides unique linguistic input even in the pre-school years, well before children can read for themselves.

Keywords: lexical richness, book language, children, child-directed speech, language acquisition, literacy

Download PDF
View PDF

Published on
16 Mar 2021
Peer Reviewed