Human versus machine: The effectiveness of ChatGPT in automated essay scoring

Autores: Jen Manning, Jeffrey Baldwin, Natasha Powell
Localización: Innovations in education and teaching international, ISSN 1470-3297, Vol. 62, Nº 5, 2025, págs. 1500-1513
Idioma: inglés
Texto completo no disponible (Saber más ...)
Resumen
- As ChatGPT continues to reshape student engagement and instructional design, it is crucial to examine its practical implications. This study aims to evaluate the effectiveness of ChatGPT3.5 and ChatGPT4 as potential automated essay scoring (AES) systems. Fifty authentic, student-written annotated bibliographies were evaluated by three human raters (HRs), ChatGPT3.5, and ChatGPT4, each performing three rounds of grading. Statistical analyses were conducted to determine if the AI evaluations were comparable to the evaluations of HRs in terms of accuracy, reliability, and consistency. The findings reveal that although AI-generated evaluations occasionally aligned with more lenient evaluations by certain individual HRs, overall, the performance of the GPT models did not align with that of HRs.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: