Do you learn less with AI?

This study by Hamsa Bastani and colleagues, which appeared during my blog holiday, shows how AI can ostensibly help learning but can also cause less learning.

Specifically, how did they investigate the impact of generative AI, such as GPT-4, on students’ learning? In a randomised controlled trial at a secondary school in Turkey on more than 1,000 students, two GPT-based tutors were tested during maths lessons, GTP Base and GPT Tutor. The results show that although GPT-4 can significantly improve performance during practice sessions (48% improvement with GPT Base and 127% with GPT Tutor), it can ultimately be detrimental to the learning process. Students who used GPT Base (the simple version developed by the researchers) scored worse on unaided exams (17% worse than students who did not have access to AI).

Students who used the GPT Base did worse on exams without help because they became too dependent on the AI during the learning process. This means they learned less in-depth and were less effective at solving problems independently without the AI’s help. Indeed, the GPT Base offered solutions and answers without actively guiding the students through the learning process. As a result, they developed less problem-solving and critical thinking skills, leading to poorer performance on exams where they had to work without help.

However, this negative effect was largely absent when using GPT Tutor that the researchers developed. This may be because this version of the AI was specifically designed to provide more guidance during the learning process instead of just providing answers. GPT Tutor helped students solve problems step by step, encouraged active thinking, and encouraged the development of problem-solving skills. This approach made students better understand and apply the learning process independently, resulting in better exam performance without AI help. The right balance of support and challenge provided by GPT Tutor helped students learn more independently and effectively.

But, the researchers warn at the end of their article:

ChatGPT is highly unreliable and often provides incorrect responses. Our results suggest that students are either unable to detect these failures or unwilling to spend the effort needed to check correctness. While GPT Tutor appears to largely mitigate these negative effects, substantial work is required to enable generative AI to positively enhance rather than diminish education.

Abstract of the study:

Generative artificial intelligence (AI) is poised to revolutionize how humans work, and has already demonstrated promise in significantly improving hu- man productivity. However, a key remaining question is how generative AI af- fects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs. We study the impact of generative AI, specifically OpenAI’s GPT- 4, on human learning in the context of math classes at a high school. In a field experiment involving nearly a thousand students, we have deployed and evaluated two GPT based tutors, one that mimics a standard ChatGPT inter- face (called GPT Base) and one with prompts designed to safeguard learning (called GPT Tutor). These tutors comprise about 15% of the curriculum in each of three grades. Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when ac- cess is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base). That is, access to GPT-4 can harm educational outcomes. These negative learning effects are largely mitigated by the safeguards included in GPT Tutor. Our results suggest that students attempt to use GPT-4 as a “crutch” during practice problem sessions, and when successful, perform worse on their own. Thus, to maintain long-term productivity, we must be cautious when deploying generative AI to ensure hu- mans continue to learn critical skills.

Leave a Reply