This study has more to it than the fun fact in the title of this post. But it does seem to be something you’d share in a conversation, that
…from infancy to young adulthood, learners absorb approximately 12.5 million bits of information about language — about two bits per minute — to fully acquire linguistic knowledge. If converted into binary code, the data would fill a 1.5 MB floppy disk.
For the people who don’t know what a floppy disk was, it’s the icon in Word to save something and it’s a predecessor of the cloud, but than stored locally. And small, well the storage room on the disk.
And now the more important insight from the study taken from the press release:
The findings, published today in the Royal Society Open Science journal, challenge assumptions that human language acquisition happens effortlessly, and that robots would have an easy time mastering it.
“Ours is the first study to put a number on the amount you have to learn to acquire language,” said study senior author Steven Piantadosi, an assistant professor of psychology at UC Berkeley. “It highlights that children and teens are remarkable learners, absorbing upwards of 1,000 bits of information each day.”
For example, when presented with the word “turkey,” a young learner typically gathers bits of information by asking, “Is a turkey a bird? Yes, or no? Does a turkey fly? Yes, or no?” and so on, until grasping the full meaning of the word “turkey.”
A bit, or binary digit, is a basic unit of data in computing, and computers store information and calculate using only zeroes and ones. The study uses the standard definition of eight bits to a byte.
“When you think about a child having to remember millions of zeroes and ones (in language), that says they must have really pretty impressive learning mechanisms.”
Piantadosi and study lead author Frank Mollica, a Ph.D. candidate in cognitive science at the University of Rochester, sought to gauge the amounts and different kinds of information that English speakers need to learn their native language.
They arrived at their results by running various calculations about language semantics and syntax through computational models. Notably, the study found that linguistic knowledge focuses mostly on the meaning of words, as opposed to the grammar of language.
“A lot of research on language learning focuses on syntax, like word order,” Piantadosi said. “But our study shows that syntax represents just a tiny piece of language learning, and that the main difficulty has got to be in learning what so many words mean.”
That focus on semantics versus syntax distinguishes humans from robots, including voice-controlled digital helpers such as Alexa, Siri and Google Assistant.
“This really highlights a difference between machine learners and human learners,” Piantadosi said. “Machines know what words go together and where they go in sentences, but know very little about the meaning of words.”
As for the question of whether bilingual people must store twice as many bits of information, Piantadosi said this is unlikely in the case of word meanings, many of which are shared across languages.
“The meanings of many common nouns like ‘mother’ will be similar across languages, and so you won’t need to learn all of the bits of information about their meanings twice,” he said.
Abstract of the study:
We introduce theory-neutral estimates of the amount of information learners possess about how language works. We provide estimates at several levels of linguistic analysis: phonemes, wordforms, lexical semantics, word frequency and syntax. Our best guess is that the average English-speaking adult has learned 12.5 million bits of information, the majority of which is lexical semantics. Interestingly, very little of this information is syntactic, even in our upper bound analyses. Generally, our results suggest that learners possess remarkable inferential mechanisms capable of extracting, on average, nearly 2000 bits of information about how language works each day for 18 years.
2 thoughts on “Learning a native language? That’s 1.5 MB… (But still robots won’t be able to do it that good fast)”
Reblogged this on kadir kozan.
Pure nonsense the following statement “The findings, published today in the Royal Society Open Science journal, challenge assumptions that human language acquisition happens effortlessly, and that robots would have an easy time mastering it.” It happens in humans ‘relatively effortlessly because it’s biologically/evolutionarily primary for humans and NOT for computers. A computer doesn’t have a brain that has evolved to be able to handle primary and secondary learning. I would have expected you (Pedro) to point this out in the blog.