Blog Post 3

I. Vision

Before the first sunrise of 2019, President Trump sent an all-caps tweet that includes words like “haters,” “fake news,” and “derangement” to celebrate the start of the new year. (Based on Michael Tauberg’s analysis on Trump’s tweets, “fake” was the most common and “haters” was the 29th most common insult word in his tweets.) This New Year’s tweet received 57,335 retweets, which was 3.1 times more than the average number of retweets he received that month. Many assume that “loaded language,” words that can elicit strong emotional reaction from the reader, can attract more people’s attention than neutral ones that plainly deliver quality facts, such as summary reports on important bills and proposals in discussion in Congress. Even among Trump’s tweets, tweets that have a higher dose of loaded language seem to receive more retweets than relatively neutral ones.

trump tweet

In today’s political climate, mudslinging and controversy seem to be the most effective ways to gain attention. This environment incentivizes politicians who are active on Twitter to write explosive tweets that are likely to get more retweets than moderate ones in order to expand their political presence and attract wide publicity. More retweets you get, more audience you can reach and create threads of discussions.

Hypothesis

This project examines if this strategy of writing polarizing or negative tweets is indeed effective in gaining people’s attention by using retweets as the primary metric to gauge people’s attention. The goal is to examine two hypotheses:

  1. More polarizing political tweets receive more retweets than neutral tweets.
  2. More negative political tweets receive more retweets than positive tweets.

Why does this research question matter?

The nature of political tweets has changed significantly from the past. In the early stages of social media’s immersion into politics, politicians’ social media accounts were filled with relatively neutral language because social media often functioned purely as an online campaign/business channel that was mainly used to disseminate information on their political endeavors and to boost fundraising and volunteer recruitment. Rather than using emotionally eruptive language to write the politician’s seemingly raw and candid opinions, the messages on social media were carefully crafted to make the politician seem clear-headed, disciplined, and calm instead of irascible, easily-baited, and temperamental.

However, the increasing importance of social media in elections made it critical to increase a politician’s online clout, often represented by number of followers, number of cumulative retweets, and number of likes. If the aforementioned hypotheses are true, then it means that many politicians will now have to tactically use polarizing and loaded languages that could either seem too negative or too positive in order to increase the number of retweets they receive, which can be a fuel to increase their number of followers and press coverage.

Examining if more polarizing tweets receive more retweets will help us gain a better insight in:

  1. Politician’s Twitter Strategy: Politicians can deliberately make their tweets contain more polarizing words and phrases that are strongly positive or negative in order to fuel their followers to retweet them. This twitter tactic can create a culture of writing tweets that seem refreshingly candid and emotionally powerful, but also create a dangerous trend of using unnecessarily loaded phrases to hide the truth, contribute to online extremism, and drive political polarization.
  2. Twitter’s Influence on Political Dialogues: The political tactics created after examining these hypotheses will help us examine how twitter directly influences political dialogues in mainstream news outlets and political commentary. It will also be interesting to see if this affects how journalists tweet about the candidates and political issues. Will they keep a neutral tone that focus on the facts? Or will they adjust their tone to match the level of polarization in Twitter?
  3. Nature of Twitter: Will Twitter be a thriving space only for extreme political opinions? An echo chamber? Can it ever function as a moderating force?

II. Data

We managed to find relatively robust datasets online, giving us about 1.6M political tweets by 341 Republicans and 306 Democrats along with other public figures who contribute to political discourse. Using this data, we created two databases using Google Cloud SQL to store and query.

Tweets: {tweet_id, user_id, timestamp, content, favorite_count, retweet_count}

Accounts: {user_id, screen_name, timestamp, follower_count, following_count}

In order to analyze the tweets, we only wanted to keep the text data; therefore, we cleaned our databases by removing extraneous information such as URLS, emojis, and special characters.

Additionally, we used the Twitter API to stream 1300 recent tweets by two prominent politians: Donald Trump (700 tweets) and Elizabeth Warren. (600 tweets).

III. Methodology

Intuitively, the reasoning behind our approach is that tweets that have more retweets are more polarizing and receive more attention from the public. So, we first used our data to calculate a Retweet Index: a measure of a tweet’s retweets relative to its author’s average amount of retweets. By doing do, we normalize our data and account for differences in baseline popularity and follower counts between different users.

Second, we used Natural Language ToolKit (NLTK) and Google Cloud Natural Language Models in order to determine the sentiment of a particular tweet between 0.0 (more negative sentiment) and 1.0 (more positive sentiment).

In addition to using the Natural Language Processing model above, we ran several statistical analyses to examine the relationship between the retweet index and the sentiment associated with each post. By applying pre-trained sentiment analyzers to Tweets, we were able to analyze and visualize the distribution of sentiment in relation to popularity and party.

IV. Results

chart1

The first visualization examines the relationship between the Tweet Sentiment and Retweet Index for Donald Trump and Elizabeth Warren. While the graph show that Trump and Warren have mostly similar graphs, Trump’s negative tweets have a Retweet Index two standard deviations higher than that of Warren. Additionally, Warren’s positive tweets have a Retweet Index two standard deviations higher than that of Trump.
chart2 chart3

These two visualization examine the distribution of sentiment for all the tweets in our data. Interestingly, the both majority of tweets and tweets with a high Retweet Index have mostly positive sentiment. chart4

In this final visualization, the average tweet sentiment between Democrats and Republicans are the same. While Republicans like Trump have tweets with keywords such as ‘SAD’, ultimately both Republicans and Democrats have mostly positive sentiment.

V. Conclusion

Through our work, we can conclude that across the spectrum, politicians tend to post positive tweets. In addition, tweets that are more polarizing (closer to 0.0 or 1.0) tend to receive more retweets. And finally, both political parties have the same average tweet sentiment.

We recognize that US politics have become increasingly polarizing these days, and one of the domains in which the polarity is displayed/furthered is through social media (Twitter, for example). We therefore thought it would be interesting to venture into the abyss and look into the political polarity in politicians’ tweets! The process of exploring this question was a fairly smooth one for us - we managed to get pretty nice twitter datasets online, which left us with 1.6M political tweets to analyze (with labeled positivity - negativity), and we also got Twitter’s permission to scrape tweets almost immediately, which we did to get recent tweets by Donald Trump and Elizabeth Warren, as we thought it would be interesting to draw the similarity between them. (Therefore, yes we did make measurable progress towards this idea!) With 1.6M tweets and 1300 tweets from Donald Trump (700) and Elizabeth Warren (600), it is definitely a good point for us to start off! (Also some sort of mentioning of how we got data of the number of retweets that each tweet has)

The tweets that we scraped did not have any sentiment labels associated with it, and those that were provided to us had labels that were not exactly helpful for us (it would either be labeled positive or negative, essentially), so we had to do a reasonable data cleaning and sentiment analysis of the tweets to normalize our positive - negative scale. We tried out Natural Language Toolkit, but it was not too useful since it was sensitive (since it classified most tweets to), and hence we ran Google Clouds Natural Language Models on it so that given a block of text (each tweet), the model will give us a point between 0 and 1, with 0 as being more negative and 1 being more positive. Before we put our tweets through this sentiment analysis toolkit, we also cleaned our data by cleaning all the emojis, special characters, etc. out of the tweets that we scraped as well!

In addition to the Natural Language Processing model above, we also did some statistical analysis to draw the connection between the number of retweets and the sentiment associated with each post. We see that the number of retweets, besides being based on the content of the tweets themselves, are also based on the number of followers that an associated account has, and therefore we came up with a retweet index to normalize the number of all tweets so we can draw the connection between our normalized retweet index and the sentiment behind the tweet. We visualized the different data that we got through this analysis in three different graphs:

  • One shows the difference between the number of retweets of Donald Trump and Elizabeth Warren get based on the sentiment index of their tweets.

  • One plots all the tweets on a 2D axis, based on the number of retweets and the sentiment of the tweets (so discrete tweets).

  • One finds the accumulate number

We also were interested in drawing the comparison between the polarity of those that are by the Republicans and those that are by the Democrats. In order for us to do that, we had to manipulate our data a bit by joining our tweets dataset to another dataset of political affiliations of twitter accounts. We then did some statistical analysis to find the connection between tweets sentiments and party affiliations as well.

Shortcomings

  • Our model of detecting political polarity might not be the best one. In this very naive model of detecting political polarity, we based our classification a lot on positivity and negativity, which lacks other measures of political polarity - racial, social discrimination, or the correctness of information being delivered being some of the possible nuance loss.

  • There also could be some potential information loss when we had to reanalyze the tweets’ sentiments on the original public tweets dataset - it already had a score of sentiment assigned to it, and by replacing it with our the information from the analysis using Google Clouds Natural Language Models, we might miss out on some nuance from the original dataset.

  • We also filtered out emojis and special characters before running our tweets through the Google Clouds Natural Language Models, and hence that could be another place where our model of political polarity analysis lacks nuance, as many politicians these days use emojis as means to communicate with the population.

  • We could look into better specific subgroups of tweets and politicians to give more pointed results (i.e. with our current model of positive and negative sentiments as means to measure political polarity, and with the distribution of our examples include tweets by many - if not all - politicians who constantly tweet positive things to appeal to the population and to show that they make good progress in making positive changes, the general “polarity” between the republicans and the democrats would look generally the same).

Further Ideas

  • Positive and negative would be important attributes to consider for political polarity, but a place for improvement could be adding more nuances to our political polarity measurement (i.e. mentioning of other politicians’ names, for example, or mentioning of other party’s name, etc.)

  • We could use the methodology we employed to analyze Trump and Warren’s tweets to analyze other politicians’ tweets in order to observe the general changes in the number of retweets in response to the polarity of tweets.

Written on May 3, 2019