AI that agrees with you doesn’t necessarily make you smarter (but it may make you more confident)

Last year, I wrote here about what I called the silence of the countervoice in AI. The issue is fairly simple: many chatbots are designed to be helpful, friendly, and affirming. That makes interactions pleasant, but it can also mean that genuine disagreement appears less often. Or, as I summarised it at the time: if we are not careful, large language models may mostly make us more comfortable, but not necessarily wiser.

A new preprint that Jan pointed me to now provides an interesting scientific elaboration of exactly that problem.

Not hallucinations, but flattery

When people talk about problems with AI, they usually think of hallucinations: systems that simply invent incorrect information. But this study by Battista and Griffiths looks at something different: sycophancy.

In this context, sycophancy means that a model tends to produce answers that align with what the user appears to believe. Not necessarily because it is true, but because it keeps the conversation flowing smoothly. At first glance, this may seem like a minor issue. In practice, it may have fairly fundamental consequences for how people form beliefs.

Becoming more certain without getting closer to the truth

The authors, Batista and Griffiths, first analyse this problem theoretically. They start from a fairly classical idea in cognitive science: people have hypotheses about the world and use new information to update those hypotheses. If that information comes from reality, you can gradually move closer to the truth. A nice principle.

But when a large language model such as ChatGPT generates answers based on the user’s hypothesis, something different happens. The examples you receive tend to fit the idea you already had. The result is somewhat paradoxical: your confidence in that idea can increase, even though you have not actually received new independent information. In effect, you end up confirming your own hypothesis with data that have already been filtered through that same hypothesis.

You are, in a sense, dancing with yourself.

The experiment

To test this idea, the researchers used a variation of Peter Wason’s classic 2-4-6 task. Participants had to discover the rule behind a sequence of numbers while interacting with an AI agent. The AI provided different kinds of feedback: confirming, disconfirming, random, or simply the default behaviour of a chatbot.

The results were striking.

When participants received random examples that were not aligned with their hypothesis, about 29.5% discovered the correct rule. With a standard AI chatbot, that number dropped to about 5.9%. At the same time, participants became more confident in their own hypothesis when the AI responded affirmatively. In short, there were fewer correct answers, but you were more confident that they were correct.

AI as an echo

What makes this study interesting is that the problem does not necessarily lie with the user. Even a fairly rational reasoner can be misled if the information source systematically generates examples that fit the existing hypothesis.

In that situation, the AI effectively becomes an epistemic echo.

That fits rather well with the point I made in my earlier blog post: when technology filters out the countervoice, thinking becomes more comfortable, but also more fragile.

Perhaps good AI should sometimes disagree

None of this means that AI is useless for thinking or learning. But it does suggest that the design of these systems matters. An AI that always confirms is a pleasant conversational partner (although…). But an AI that sometimes offers counterexamples or challenges your hypothesis is probably a better thinking partner.

And perhaps that reflects a fairly old lesson: if all your conversation partners agree with you, you usually don’t learn very much.

Leave a Reply