What do you learn about language when you examine 114 million tweets from 2.77 million users?

A tweet is a short message from 140 digits (and most of the time less). And while texting and tweeting seems to have affected our language (for most of the people probably not in a good way) is this really the case? How does Twitter affects our (online) language? Research from the Georgia Institute of Technology shows in short that:

  • when tweeters use hashtags — a practice that can enable messages to reach more people — they tend to be more formal and drop the use of abbreviations and emoticons.
  • when they use the @ symbol to address smaller audiences, they’re more likely to use non-standard words such as “nah,” “cuz” and “smh.”
  • when people write to someone from the same city, they are even more likely to use non-standard language — often lingo that is specific to that geographical area.

It almost seems like normal conversation?

I do think – as location seems important – that this could be different for different languages/cultures/regions. It also makes you wonder how global we are.

From the press release:

Jacob Eisenstein, an assistant professor in Georgia Tech’s School of Interactive Computing, led the research. His team sifted through three years of tweets ¬- a pool that included 114 million geotagged messages from 2.77 million users. He says the study helps explain a puzzle about language in social media.

“Since social media facilitates conversations between people all over the world, we were curious why we still see such a remarkable degree of geographical differentiation in online language,” said Eisenstein. “Our research shows that the most geographically differentiated language is more likely to be used in messages that will reach only a local audience, and therefore, will be less likely to spread to other locations.”

For example, while the emoticon 🙂 is used everywhere, the alternative ;o is significantly more popular in Los Angeles. Similarly, “mayne,” a drawn out way of pronouncing “man,” is more likely to be found in Houston than anywhere else.

“People want to show their regional identity or their tech savviness, using Twitter-specific terms, to their close social network ties,” said Umashanthi Pavalanathan, a Georgia Tech graduate research scientist who worked on the study.

Eisenstein has looked at popular Twitter word trends and their origins for the last seven years. The more he studies, the more he realizes that Twitter users are smarter than most people give them credit for.

“This research shows that for many people on Twitter, non-standard English is not a question of ability, but of reserving standard English for the right social situations,” said Eisenstein. “In this sense, heavy social media users have an especially nuanced understanding of language, since they maintain multiple linguistic systems. They know to use each system when it’s socially appropriate.”

The paper, “Audience-Modulated Variation in Online Social Media,” is published in the journal American Speech.

Abstract of the study:

Stylistic variation in online social media writing is well attested: for example, geographical analysis of the social media service Twitter has replicated isoglosses for many known lexical variables from speech, while simultaneously revealing a wealth of new geographical lexical variables, including emoticons, phonetic spellings, and phrasal abbreviations. However, less is known about the SOCIAL role of variation in online writing. This article examines online writing variation in the context of audience design, focusing on affordances offered by Twitter that allow users to modulate a message’s intended audience. We find that the frequency of nonstandard lexical variables is inversely related to the size of the intended audience: as writers target smaller audiences, the frequency of lexical variables increases. In addition, these variables are more often used in messages that are addressed to individuals who are known to be geographically local. This phenomenon holds not only for geographically differentiated lexical variables, but also for nonstandard variables that are widely used throughout the United States. These findings suggest that users of social media are attuned to both the nature of their audience and the social meaning of lexical variation and that they customize their self-presentation accordingly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.