Shin, D., & Lee, J. H. (2024). Exploratory study on the potential of ChatGPT as a rater of second language writing. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12817-6
Abstract
In recent years, various strategies have been employed to integrate ChatGPT into the field of second language (L2) teaching and learning. In line with such efforts, this study investigates the potential of ChatGPT as an automated writing evaluation (AWE) tool for L2 assessment, given the lack of systematic and quantitative investigation into human ratings and GPT-based scoring chatbot’s ratings. We took an innovative approach by utilising ChatGPT’s new feature called ‘My GPTs’, which is a customised chatbot builder based on GPT-4. The dataset for assessment consisted of 50 English essays written by Korean secondary-level EFL students, which were rated by the developed GPT-based scoring chatbot and two in-service English teachers. The intraclass correlation coefficient results suggested a strong similarity between human rater and ChatGPT scores. However, those based on the multifaceted Rasch model further revealed that ChatGPT showed a slightly greater deviation from the model than its human counterparts. This study demonstrates the potential of ChatGPT in AWE, providing an accessible and supplementary tool to L2 teachers’ ratings.
Keywords: Automated writing evaluation · ChatGPT · Many-faceted Rasch model · Rater evaluation