Last year, I wrote here about what I called the silence of the countervoice in AI. The issue is fairly simple: many chatbots are designed to be helpful, friendly, and affirming. That makes interactions pleasant, but it can also mean that genuine disagreement appears less often. Or, as I summarised it at the time: if we are not careful, large language models may mostly make us more comfortable, but not necessarily wiser.
Not hallucinations, but flattery
When people talk about problems with AI, they usually think of hallucinations: systems that simply invent incorrect information. But this study by Battista and Griffiths looks at something different: sycophancy.
In this context, sycophancy means that a model tends to produce answers that align with what the user appears to believe. Not necessarily because it is true, but because it keeps the conversation flowing smoothly. At first glance, this may seem like a minor issue. In practice, it may have fairly fundamental consequences for how people form beliefs.
Becoming more certain without getting closer to the truth
The authors, Batista and Griffiths, first analyse this problem theoretically. They start from a fairly classical idea in cognitive science: people have hypotheses about the world and use new information to update those hypotheses. If that information comes from reality, you can gradually move closer to the truth. A nice principle.
But when a large language model such as ChatGPT generates answers based on the user’s hypothesis, something different happens. The examples you receive tend to fit the idea you already had. The result is somewhat paradoxical: your confidence in that idea can increase, even though you have not actually received new independent information. In effect, you end up confirming your own hypothesis with data that have already been filtered through that same hypothesis.
You are, in a sense, dancing with yourself.
The experiment
To test this idea, the researchers used a variation of Peter Wason’s classic 2-4-6 task. Participants had to discover the rule behind a sequence of numbers while interacting with an AI agent. The AI provided different kinds of feedback: confirming, disconfirming, random, or simply the default behaviour of a chatbot.
The results were striking.
When participants received random examples that were not aligned with their hypothesis, about 29.5% discovered the correct rule. With a standard AI chatbot, that number dropped to about 5.9%. At the same time, participants became more confident in their own hypothesis when the AI responded affirmatively. In short, there were fewer correct answers, but you were more confident that they were correct.
AI as an echo
What makes this study interesting is that the problem does not necessarily lie with the user. Even a fairly rational reasoner can be misled if the information source systematically generates examples that fit the existing hypothesis.
In that situation, the AI effectively becomes an epistemic echo.
That fits rather well with the point I made in my earlier blog post: when technology filters out the countervoice, thinking becomes more comfortable, but also more fragile.
Perhaps good AI should sometimes disagree
None of this means that AI is useless for thinking or learning. But it does suggest that the design of these systems matters. An AI that always confirms is a pleasant conversational partner (although…). But an AI that sometimes offers counterexamples or challenges your hypothesis is probably a better thinking partner.
And perhaps that reflects a fairly old lesson: if all your conversation partners agree with you, you usually don’t learn very much.