This website requires JavaScript.

Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?

Jaromir SavelkaArav AgarwalChristopher BogartYifan SongMajd Sakr
Mar 2023
We evaluated the capability of generative pre-trained transformers (GPT), topass assessments in introductory and intermediate Python programming courses atthe postsecondary level. Discussions of potential uses (e.g., exercisegeneration, code explanation) and misuses (e.g., cheating) of this emergingtechnology in programming education have intensified, but to date there has notbeen a rigorous analysis of the models' capabilities in the realistic contextof a full-fledged programming course with diverse set of assessmentinstruments. We evaluated GPT on three Python courses that employ assessmentsranging from simple multiple-choice questions (no code involved) to complexprogramming projects with code bases distributed into multiple files (599exercises overall). Further, we studied if and how successfully GPT modelsleverage feedback provided by an auto-grader. We found that the current modelsare not capable of passing the full spectrum of assessments typically involvedin a Python programming course (<70% on even entry-level modules). Yet, it isclear that a straightforward application of these easily accessible modelscould enable a learner to obtain a non-trivial portion of the overall availablescore (>55%) in introductory and intermediate courses alike. While the modelsexhibit remarkable capabilities, including correcting solutions based onauto-grader's feedback, some limitations exist (e.g., poor handling ofexercises requiring complex chains of reasoning steps). These findings can beleveraged by instructors wishing to adapt their assessments so that GPT becomesa valuable assistant for a learner as opposed to an end-to-end solution.