1 24 Table of Contents 26 462

Chais_2026

ע 3 Montathar Faraon מאיה אושר, ספר הכנס העשרים ואחד לחקר חדשנות וטכנולוגיות למידה ע"ש צ'ייס: האדם הלומד בעידן הדיגיטלי א' בלאו, ד' אולניק - שמש, נ' גרי, א' כספי, י' סידי, י' עשת - אלקלעי, י' קלמן ו נ' ברנדל )עורכים(, רעננה: האוניברסיטה הפתוחה מיהו המעריך המדויק ביותר? הלימה בין ציוני ChatGPT ועמיתים לבין ציוני המרצה עבור פרויקטי סטודנטים ברמות איכות שונות Montathar Faraon Kristianstad University Emailmontathar.faraon@hkr.se מאיה אושר HIT מכון טכנולוגי חולון mayau@hit.ac.il Who Is the Most Accurate Evaluator? Alignment of ChatGPT and peer grades with instructor grades across varying levels of student project quality Maya Usher HIT Holon Institute of Technology mayau@hit.ac.il Montathar Faraon Kristianstad University montathar.faraon@hkr.se Abstract The integration of generative artificial intelligence (GenAI) tools into assessment processes in higher education raises critical questions about their alignment with human evaluations. The effectiveness of AI chatbots such as ChatGPT in generating assessments comparable to human evaluators remains unclear, with limited research offering direct comparisons in educational contexts. This quantitative study aims to examine grading alignment across three evaluators of student group projects (ChatGPT, peers, and the course instructor) and to determine whether evaluator agreement varies by project quality. The study involved 184 undergraduate students who submitted a group project and provided peer assessments of their classmates' work. The projects were categorized into three quality levels: low, medium, and high. The analyses revealed that alignment with instructor grading varied systematically by both grading source and project quality. ChatGPT struggled to identify weaker projects and tended to assign them inflated grades, whereas students were better at identifying weaker work but were overly strict toward high-quality projects. In addition, alignment with the instructor's grades was not consistent but depended on project quality, with larger discrepancies observed for ChatGPT's evaluations relative to those of peers. Alignment between ChatGPT's grades and the instructor improved as project quality increased, whereas peer–instructor alignment was strongest for lower-quality work. These findings support a cautious integration of ChatGPT into assessment processes, with the final decision remaining with informed and critical human judgment. Keywords: ChatGPT, Generative artificial intelligence (GenAI), Higher education, Peer assessment, Peer feedback.

Made with FlippingBook

RkJQdWJsaXNoZXIy Mjk0MjAwOQ==