Identifying Generative Artificial Intelligence Chatbot Use on Multiple-Choice, General Chemistry Exams Using Rasch Analysis

Benjamin Sorenson; Kenneth Hanson

Ayuda

Identifying Generative Artificial Intelligence Chatbot Use on Multiple-Choice, General Chemistry Exams Using Rasch Analysis

Benjamin Sorenson ^[1] ; Kenneth Hanson ^[1]
1. [1] Florida State University
  
  Florida State University
  
  Estados Unidos
Localización: Journal of chemical education, ISSN 0021-9584, Vol. 101, Nº 8, 2024, págs. 3216-3223
Idioma: inglés
Texto completo no disponible (Saber más ...)
Resumen
- Generative artificial intelligence (AI) technology is expected to have a profound impact on chemical education. While there are certainly positive uses, some of which are being actively implemented even now, there is a reasonable concern about its use in cheating. Efforts are underway to detect generative AI usage on open-ended questions, lab reports, and essays, but its detection on multiple choice exams is largely unexplored. Here we propose the use of Rasch analysis to identify the unique behavioral pattern of ChatGPT on General Chemistry II, multiple choice exams. While raw statistics (e.g., average, ability, outfit) were insufficient to readily identify ChatGPT instances, a strategy of fixing the ability scale on high success questions and then refitting the outcomes dramatically enhanced its outlier behavior in terms of Z-standardized out-fit statistic and ability displacement. Setting the detection threshold to a true positive rate (TPR) of 1.0, a false positive rate (FPR) of <0.1 was obtained across a majority of the 20 exams investigated here. Furthermore, the receiver operating characteristic curve (i.e., FPR vs TPR) exhibited outstanding areas under the curve of >0.9 for nearly all exams. While limitations of this method are described and the analysis is by no means exhaustive, these outcomes suggest that the unique behavior patterns of generative AI chat bots can be identified using Rasch modeling and fit statistics.