Tuesday, September 24, 2013

Linguistic 'Eggheads' Can Figure Out Who You Are From Your Tweets

Let me say first, I don't do 'tweets'. I have no Twitter account - though I did finally get a Facebook page I hardly ever go to- and intend to keep it that way. "Tweeting" (140 maximum characters sent at a time) is not my thing, and besides I have too many pressing projects ongoing (including revising several books, and finishing a science fiction novel) to be held hostage to tweets, reading them or responding. It's hard enough to keep up with emails!

Anyway, having said that, it appears a lot of eggheads in linguistics  have their focus on the Tweeter-ites, and according to a recent piece in TIME (Sept. 9, p. 56):

"are using the seven year old micro-blogging platform to put millions of tweets under the microscope in an instant".

According to one linguist, Ben Zimmer, at Vocabulary.com:

"It's unprecedented. The sheer amount of text you can look at at one time and the number of people you can analyze at once."

According to the linguists, tweets are a veritable goldmine into the profiles of tweeters, offering a treasure trove of data never seen before. Hidden in the tweets, or so they say, are insights into how people (at least those who tweet)  portray their identity in a few short sentences. Great! I can just see how delighted Gen. Keith Alexander of the NSA will be when he gets a hold of this data!

Maybe even more sinister, is how "campaign managers and advertisers" are taking note and fairly drooling at the prospect of "pulling signal out of the data" in order to better influence American or other brains to do the bidding of Madison Avenue or the Overclass manipulators.

Some of the finds so far from the tweet research:

- Women are more likely to use first person terms (like 'I' and 'my') and exclamation points, especially repeated ones.

- Females who tweet to  largely male audience are more likely to use features like numbers, associated with 'the boys'.

- Older tweeters tend to use emoticons with noses e.g. like :-) instead of :)  this is evidently tied to "their preference for conventional language"

- Youthful no 'nose' tweeters tend to use more swear words

- Younger tweeters are also more apt to type their words in all capitals and to use expressive lengthening, like: 'Niiiiiiiiiiice"'

- Older tweeters are more likely to include well wishing, like 'Take care', and 'Good morning'. They also tend to send longer tweets and use more prepositions.


Geography, income and race can also be discerned from tweets:

For example, the term 'suttin' (a variant of 'something') has been associated with Boston-area tweets.  Meanwhile, the acronym ikr ('I know, right?') is popular in the Detroit area.

Wealthy neighborhoods are more likely to use the word 'awesome', and emoticons often appear in tweets from areas with large Hispanic populations.

Linguists point out that while all this may seem frivolous, it provides insights into how people purposefully and unwittingly use words to signal who they are. According to one computational linguist from Georgia Tech, Tweet trends also "make it possible to guess the demographics of senders when no information is explicitly provided." This is a huge asset for any advertiser who intends to sell products on Twitter.

Other researchers at the Mitre Corporation came up with an algorithm that could determine the tweeter's sex 75% of the time, just using their tweets.

Another aspect that is being researched is the "diffusion of words"  or constructing "subway maps around the United States showing where words to tend to move." They've already found that race may matter as much as geography. A term coined in Jackson, MS, for example, may well turn up in Memphis (both places have high proportions of African-Americans) but not in Fairbanks, AK or Colorado Springs, CO.

Other researchers are mining tweets to discern how rumors and urban legends spread from person to person.

Linguists so far are delighted that "tweeters are generally oblivious to the possibility that their messages might be scrutinized" which is a "boon to researchers who want to analyze natural speech rather than edited text you find in the pages of a magazine, or in certain blogs (like Brane Space- with each post undergoing maybe 4-5 iterations of re-edits).

According to one linguist: "They don't feel like they are being observed by guys in white coats."

Hmmmm......maybe they should, as opposed to dumping ever more data into the data stream, sure to be used by advertisers and even the government (NSA) - say to match profiles into already existing databases such as Main Core - the database that can identify and locate perceived ‘enemies of the state’ almost instantaneously.(See e.g http://brane-space.blogspot.com/2013/06/between-skeleton-key-and-cog-how-close.html )

Will the tweeters take note and be more cautious? Not likely!



No comments:

Post a Comment