By Gretchen Parker
Politics, movies, television, baseball and revolutions. As part of its emphasis on participatory culture, USC’s Annenberg Innovation Lab has developed a project that analyzes – in real time – the sentiment of conversation on a range of topics that thrive via social media.
The Twitter Sentiment Analysis index has been used to mine the positive and negative sentiment of 40 million tweets and reveal insights about the international conversation on everything from the Arab Spring revolutions to the U.S. Presidential election. With the help of IBM software called InfoSphere Streams, the Innovation Lab has worked with researchers from the Viterbi School of Engineering’s Signal Analysis and Interpretation Laboratory to develop a lexicon and advanced language algorithms that, in effect, teach a computer to understand the true sentiment behind the mini-messages broadcast by Twitter.
The trick has been helping the software “learn” the difference between enthusiasm and sarcasm, communication professor Jonathan Taplin, director of USC's Annenberg Innovation Lab, told an audience gathered for USC’s GLIMPSE Digital Technology Showcase on Jan. 29.
“Sarcasm is not something computers understand very well,” Taplin said.
Lab researchers used the tool to analyze Twitter during the entire 16-month Presidential election cycle, from the beginning of the primaries to election night. Over that time, they continued to refine the technology to make it more accurate.
“When we first started, Michele Bachmann was the flavor of the week. Someone said, ‘I’m so happy Michele Bachmann threw her tin-foil hat into the ring.’ The computer, of course, thought this was very positive for Bachmann,” Taplin said, drawing laughs from the technology journalists, USC faculty and supporters gathered for the event.
The lab then brought in more students, friends and observers to meet the challenge of parsing sarcasm. Researchers developed an online tool they could use to annotate tweets individually, to correct the computer and help it learn more about language patterns. They used Amazon Mechanical Turk, a crowdsourcing marketplace, to correct the analysis of thousands of tweets.
“We think we learned an awful lot about sarcasm,” Taplin said. For example, “When someone puts one word in quotes, it probably means just the opposite. Learning this and learning emoticons have helped us refine the work.”
Click here to see a screenshot of the Sentiment Analysis tool at the close of the election: [http://politics.twittersentiment.org/streams/]. (The political sentiment team was led by USC Annenberg communication professor François Bar and Shri Narayanan, both Research Fellows at the Innovation Lab.)
The lab is most excited about the real-time power of the Sentiment Analysis tool, Taplin said. During the Presidential debates, the tool analyzed 400 tweets per second.
“As soon as ‘binders full of women’ came up, we added it to the key words. You could follow it coming out of nowhere… That’s the fun thing now, to try and understand more,” Taplin said. “This is like a 10 million-person focus group. Watching these meters go up and down in real time – especially during the debates – was unbelievable.”
The tool also has been used to examine World Series conversation vs. TV ratings, viewer engagement during the Oscars telecast and to predict movie box office draws.
Next up, researchers want to use what they learned while analyzing the Oscars broadcast and apply it to television. Over the next few months, they’ll be watching reality TV and gauging engagement by watching social media. It could offer more insight than Nielsen ratings, which are a snapshot of how many TVs are tuned in but not how many people are watching.
The sentiment tool, in contrast, shows you “the exact moment when people are engaged,” Taplin said.
“If the sentiment turns bad, you could look back at minute 36 and see what went wrong. We think this is a tool producers would like.”