Reinforcement learning and working memory work hand-in-hand as people learn new tasks

I found this study via Paul Kirschner. The study is interesting as the researchers involved were able to show that reinforcement learning and working memory — two distinct brain systems — work hand-in-hand as people learn new tasks.

From the press release (bold by me):

The study, published in Proceedings of the National Academy of Sciences, focused on the interplay of two very different modes of learning a new task: reinforcement learning and working memory. Reinforcement learning is an “under-the-hood” process in which people gradually learn which actions to take by processing rewards and punishments at the neural level, and then choosing the one that works best on average — even if the person is not aware of it. In contrast, working memory involves keeping previous actions and their outcomes in mind to more rapidly and flexibly improve performance.

“People have largely interpreted these systems as working independently or as competing with each other in the learning process,” said Michael Frank, a professor in Brown’s Department of Cognitive, Linguistic and Psychological Sciences and co-author of the paper. “But we show that the two work together, with neural signals underlying working memory helping to guide those that support reinforcement learning.”

Anne Collins, an assistant professor at the University of California, Berkeley, led the work when she was a postdoctoral researcher working with Frank, who directs the Initiative for Computation in Brain and Mind in the Brown Institute for Brain Science. Collins and Frank developed an experimental method designed to isolate the brain signals associated with each of the two systems.

For the study, 40 study participants were shown a series of symbols on a screen and asked, for each symbol, to press a particular button on a keyboard. They weren’t told which key was the right one for each symbol. They had to learn it. When they got it right, they were rewarded with points. Over repeated trials, the participants came to learn which keys corresponded with which symbols.

In order to distinguish the contributions from reinforcement learning and working memory, the researchers set up problems with different numbers of symbols, ranging from two to six, and participants had to learn which button to press for each of them. Generally, people can only hold three or four items in working memory at a time, and only for short periods of time. So when the number of symbols or the delay increases, the contribution of working memory to the learning process should diminish.

As the participants performed the tasks, an EEG cap recorded signals from the brain, and the authors applied statistical methods to extract those signals related to one learning system or the other.

The study showed that when memory demands were high, the signals in the brain correlated to reinforcement learning actually got stronger. In other words, when the working memory system was overtaxed, the reinforcement learning system became more important in the learning process. In contrast, when participants could hold information in mind, signals associated with reinforcement learning were weaker, suggesting an increased role for working memory.

The researchers also found that they could decode from the brain signals in a particular trial whether information was likely to be in memory or not. That too traded off with the neural marker of reinforcement learning.

Those findings, the researchers say, suggest that the two systems aren’t working independently.

“If they were completely independent of each other, we’d expect the signals associated with reinforcement learning to stay the same regardless of memory demands,” Frank said. “But that’s not we see, and that’s a sign that the two systems are interacting.”

But on its own, that finding didn’t reveal the nature of that interaction — whether it’s cooperative or competitive. Was working memory shoving the reinforcement learning into the background in trials when the information could be readily accessible in mind? Or could it be that working memory helps to augment reinforcement learning? To figure that out, the researchers looked how the brain signals associated with reinforcement learning changed as the learning process unfolded from trial to trial.

The reinforcement learning system is driven by what’s known as “reward prediction error” or RPE, and it’s the signal the researchers used to track the reinforcement learning process. RPE represents the extent to which the reward that results from an action exceeds one’s expectations. Take for example a study participant trying to figure out which button to press when they see a given symbol. If they happen to guess right and get rewarded with points, that outcome is surprisingly good and produces a high RPE.

In the brain, the reinforcement learning system uses the neurotransmitter dopamine to encode RPE. A high RPE — meaning a surprisingly good outcome — is associated with a large release of dopamine. The reinforcement learning system uses that dopamine flood as a signal to update our understanding of what actions we should take to get a given reward. When we repeat that action subsequently, we’re less surprised by the reward and so the RPE is lower. As RPE continues to diminish, the system eventually stops updating, and in so doing, settles upon an appropriate action.

One scenario for how working memory could be interacting with reinforcement learning is by attenuating reward expectations, making them more quickly come into line with actual rewards. In that way, working memory could be working cooperatively to speed the reinforcement learning process.

The study found strong evidence for just that scenario. During repeated trials at small set sizes where working memory is active, brain signals associated with RPE started out high in the first few trials, and then quickly dropped off — a sign that cognitive processes are informing the neural signaling associated with reinforcement learning. In contrast, if working memory were merely suppressing reinforcement learning, one wouldn’t expect to see the quick drop in RPE.

The results, Frank said, provide some of the first concrete evidence for cooperation between these two systems.

“Thinking of these not as separate systems but as one big integrated system changes our understanding of the basic science of how people and animals learn,” Frank said. “It might help us make better predictions about how the overall learning process is affected in people who have deficits in either of these systems.”

And that, Frank said, could one day lead to better treatments for learning impairments.

Abstract of the study:

Learning from rewards and punishments is essential to survival and facilitates flexible human behavior. It is widely appreciated that multiple cognitive and reinforcement learning systems contribute to decision-making, but the nature of their interactions is elusive. Here, we leverage methods for extracting trial-by-trial indices of reinforcement learning (RL) and working memory (WM) in human electro-encephalography to reveal single-trial computations beyond that afforded by behavior alone. Neural dynamics confirmed that increases in neural expectation were predictive of reduced neural surprise in the following feedback period, supporting central tenets of RL models. Within- and cross-trial dynamics revealed a cooperative interplay between systems for learning, in which WM contributes expectations to guide RL, despite competition between systems during choice. Together, these results provide a deeper understanding of how multiple neural systems interact for learning and decision-making and facilitate analysis of their disruption in clinical populations.

Related

Leave a ReplyCancel reply