How Do the Kids Speak? Improving Educational Use of Text Mining with Child-Directed Language Models

Organisciak, P., Newman, M., Eby, D., Acar, S. and Dumas, D. (2023), "How do the kids speak? Improving educational use of text mining with child-directed language models", Information and Learning Sciences, https://doi.org/10.1108/ILS-06-2022-0082

26 Pages Posted: 20 Jan 2023 Last revised: 23 Jan 2023

See all articles by Peter Organisciak

Peter Organisciak

University of Denver

Michele Newman

University of Washington

David Eby

University of Illinois at Urbana-Champaign

Selcuk Acar

University of North Texas

Denis Dumas

University of Georgia

Date Written: January 19, 2023

Abstract

Purpose
Most educational assessments tend to be constructed in a close-ended format, which is easier to score consistently and more affordable. However, recent work has leveraged computation text methods from the information sciences to make open-ended measurement more effective and reliable for older students. This study asks whether such text applications need to be adapted when used with samples of elementary-aged children.

Design/methodology/approach
This study introduces domain-adapted semantic models for child-specific text analysis, to allow better elementary-aged educational assessment. A corpus compiled from a multi-modal mix of spoken and written child-directed sources is presented, used to train a children’s language model, and evaluated against standard non-age-specific semantic models.

Findings
Child-oriented language is found to differ in vocabulary and word sense use from general English, while exhibiting lower gender and race biases. The model is evaluated in an educational application of divergent thinking measurement and shown to improve on generalized English models.

Originality
Research in computational measurement of open-ended responses has thus far used models of language trained on general English sources or domain-specific sources such as textbooks. This paper is the first to study age-specific language models for educational assessment. Additionally, while there have been several targeted, high-quality corpora of child-created or child-directed speech, the corpus presented here is the first developed with the breadth and scale required for large-scale text modeling.

Research limitations/implications
The findings demonstrate the need for age-specific language models in the growing domain of automated divergent thinking and strongly encourage the same for other educational uses of computation text analysis by showing a measurable difference in the language of children.

Social implications
Understanding children’s language more representatively in automated educational assessment allows for more fair and equitable testing. Further, child-specific language models have fewer gender and race biases.

Keywords: educational data mining, text mining, learning, assessment, language modeling, divergent thinking

Suggested Citation

Organisciak, Peter and Newman, Michele and Eby, David and Acar, Selcuk and Dumas, Denis, How Do the Kids Speak? Improving Educational Use of Text Mining with Child-Directed Language Models (January 19, 2023). Organisciak, P., Newman, M., Eby, D., Acar, S. and Dumas, D. (2023), "How do the kids speak? Improving educational use of text mining with child-directed language models", Information and Learning Sciences, https://doi.org/10.1108/ILS-06-2022-0082 , Available at SSRN: https://ssrn.com/abstract=4329061

Peter Organisciak (Contact Author)

University of Denver ( email )

2201 S. Gaylord St
Denver, CO 80208-2685
United States

Michele Newman

University of Washington ( email )

David Eby

University of Illinois at Urbana-Champaign ( email )

Selcuk Acar

University of North Texas ( email )

Denis Dumas

University of Georgia ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
19
Abstract Views
152
PlumX Metrics