Shin, Lee, & Kim (2025)...a GenAI-based speech scoring system...

Shin, D., Lee, J. H., & Kim, K. (2025). Exploratory study on developing and eval‎uating a GenAI-based speech scoring system for L2 teaching practitioners. Innovation in Language Learning and Teaching, 1–17. <a href="https://doi.org/10.1080/17501229.2025.2596668" target="_blank" class="ke-link">https://doi.org/10.1080/17501229.2025.2596668</a> 등록전 Published online: 20 Dec 2025  Free Access <a href="https://www.tandfonline.com/eprint/FK6JTNDAWWCGMSCE7HBD/full?target=10.1080/17501229.2025.2596668#abstract" target="_blank" class="ke-link">https://www.tandfonline.com/eprint/FK6JTNDAWWCGMSCE7HBD/full?target=10.1080/17501229.2025.2596668#abstract</a>   ABSTRACT Given the increasing demand for automated speech scoring systems in the L2 domain, several tools have been introduced, most of which were developed by large publishing and testing companies. This study proposes a more affordable and accessible approach to developing a customized speech scoring system for L2 teaching practitioners by leveraging generative AI platforms. Specifically, ChatGPT’s My GPTs feature was employed to develop a customized L2 automated speech scoring tool capable of eval‎uating fluency and pronunciation in L2 speech samples. A total of 50 speech samples from Korean secondary-level EFL students were scored by the GPT-based tool and two human raters. The results showed a strong positive correlation between the GPT-based tool’s scores and those of the human raters. Additionally, an analysis of rater fit statistics using the many-facet Rasch model indicated that the GPT-based tool’s scoring fell within acceptable ranges, suggesting its potential as a reliable eval‎uator of L2 learners’ speech. Future research directions are provided based on the limitations of this study; these limitations include a small sample size, the employment of a single task type and L1 group, and the current GPT system’s inability to assess speaking domains beyond fluency and pronunciation.KEYWORDS:<ul style="list-style-type: disc;" data-ke-list-type="disc"><li>Automated speech scoring, ChatGPT, GenAI, many-facet Rasch model, rater fit analysis</li></ul>