Shin, D., Lee, J. H., & Kim, K. (2025). Exploratory study on developing and evaluating a GenAI-based speech scoring system for L2 teaching practitioners. Innovation in Language Learning and Teaching, 1–17. https://doi.org/10.1080/17501229.2025.2596668 등록전
Published online: 20 Dec 2025
Free Access https://www.tandfonline.com/eprint/FK6JTNDAWWCGMSCE7HBD/full?target=10.1080/17501229.2025.2596668#abstract
ABSTRACT
Given the increasing demand for automated speech scoring systems in the L2 domain, several tools have been introduced, most of which were developed by large publishing and testing companies. This study proposes a more affordable and accessible approach to developing a customized speech scoring system for L2 teaching practitioners by leveraging generative AI platforms. Specifically, ChatGPT’s My GPTs feature was employed to develop a customized L2 automated speech scoring tool capable of evaluating fluency and pronunciation in L2 speech samples. A total of 50 speech samples from Korean secondary-level EFL students were scored by the GPT-based tool and two human raters. The results showed a strong positive correlation between the GPT-based tool’s scores and those of the human raters. Additionally, an analysis of rater fit statistics using the many-facet Rasch model indicated that the GPT-based tool’s scoring fell within acceptable ranges, suggesting its potential as a reliable evaluator of L2 learners’ speech. Future research directions are provided based on the limitations of this study; these limitations include a small sample size, the employment of a single task type and L1 group, and the current GPT system’s inability to assess speaking domains beyond fluency and pronunciation.
KEYWORDS: