Hacking Twitter Trends to Make Money

A user-friendly guide to Twitter-scraping and what you can do with it.

Graham Sahagian
The Startup

--

If you’ve ever had the urge to algorithmically calculate the general publics opinion on social justice, Coronavirus or Joe Rogan or if you’ve ever wanted to build an app that tells you when to buy and sell a stock based on the general public’s opinion of it, this article might just be what you’re looking for.

I’ll be showing you how to download up to thousands of tweets in a few minutes using Python 3.8, then how to quickly analyze these tweets with VADER sentiment analysis tool. Using this data we can build a simple trading algorithm to tell us when to buy or sell our stock. The first operation, extracting tweet data from Twitter, used to be possible with another library called GetOldTweets3 or Tweepy, however Tweepy doesn’t fit our needs in terms of depth of data and Twitter recently deprecated the endpoint which enabled functionality of GetOldTweets3. Thanks to a Github thread (linked in a section below), we’re able to use snscrape to extract this tweet data.

I wanted to see if a twitter-sentiment bot could predict Pfizer’s (PFE) market price, among talk of their vaccine and the news boasting 90% effectiveness on their recent trials.

Photo by MORAN on Unsplash

In this article we will:

  1. Extract tweets including keyword ‘Pfizer’ from Twitter using Python library snscrape
  2. Analyze the sentiment of each tweet using VADER sentiment analysis tool
  3. Resample the data to daily buckets
  4. Build a sentiment-based trading strategy

This twitter scraping and analysis guide will be entirely using Python 3.8. If you useJupyter Notebook as I did, make sure your notebook is running the Python 3.8 kernel. I personally recommend creating a new virtual environment for Python projects running on 3.8. You can check if your IDE is running the right version with:

from platform import python_versionprint(python_version())

To start out we need to import a few libraries:

import pandas as pd
import numpy as np
import snscrape.modules.twitter as sntwitter
import csv
import matplotlib.pyplot as plt

Now we are able to use the snscrape library by JustAnotherArchivist on Github (https://github.com/JustAnotherArchivist/snscrape)to pull tweets into a data frame with filters of our choosing. Make sure to pip3 install snscrape before trying to import the library. Special thanks to everyone in this thread for pointing me in the right direction and providing the following code: https://github.com/Mottl/GetOldTweets3/issues/98

You can alter the filters in the above code to change the period, keyword, output file name. There are other filters such as geolocation that also can be fun to play with but won’t be used in this project.

NOTE: Be weary of the keyword you use as you may be surprised with how many tweets are published per minute including that keyword. I initially used ‘COVID’ however my maxTweets = 20000 was reached before the scraper reached the second day. You might be tempted to set your maxTweets to be upwards of 500,000 (like I was), however when I let the scraper run for that long I receive a connection timeout. If you want to pull this much data, you can bypass this limitation by setting your maxTweets to a smaller number (say 100000 or so) and concatenate the resulting dataframes together afterwards.

Sentiment Analysis with VADER

Now we have a pfe_tweets_result.csv, a file with about 12,000 tweets and we can analyze each of them with quickly with VADER (Valence Aware Dictionary and sEntiment), is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. Importing the library:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Initializing and trying out the sentiment analyzer:

analyzer = SentimentIntensityAnalyzer()def sentiment_analyzer_scores(sentence):
score = analyzer.polarity_scores(sentence)
return("{:-<40} {}".format(sentence, str(score)))
print(sentiment_analyzer_scores("PFE sucks!"))

This should print out a few sentiment scores, respectively: neg, neu, pos, compound. All values bounded between 1 and -1. The first three scores combined equal 1.

Reading the csv file we just wrote back into our notebook:

df = pd.read_csv('/Users/Graham/data-works/pfe_tweets_result.csv')
df = df.set_index('date')

Creating a column for the various sentiment scores of each individual tweet:

Building a Trading Strategy on Sentiment Data

In order to build a trading strategy around this sentiment data, we need to download the ticker data for Pfizer [PFE] from Yahoo Finance and resample our data so that we have daily sentiment values. In order to fit our sentiment data to our PFE ticker data, we also need to remove the weekend rows. Then we write it to a new CSV file so we can change some formatting of the date column so we can successfully merge our dataframes.

df.index = pd.to_datetime(df.index, errors='coerce',format='%Y-%m-%d %H:%M:%S')df = df.resample('D').mean() df = df.loc[df.index.to_series().dt.weekday < 5] #Remove weekendsdf.to_csv('tweets_resampled_mean_no_weekends.csv')

In Excel we remove the hours, minutes and seconds data from our date column and rename our date column ‘Date’ so its the same as the ‘Date column in our ticker data. To remove the h/m/s data, navigate to the ‘Data’ tab, highlight the ‘date’ column, select ‘Space’ as a delimiter, click ‘Next’, then select ‘YMD’ as our date format.

Then we read this csv file and our PFE ticker csv file into our IDE. Then we merge them on our ‘Date’ index column:

corrected_df = pd.read_csv("/Users/Graham/data-works/tweets_no_wks_corrected.csv", parse_dates=True, index_col=0)_pfe = pd.read_csv("/Users/Graham/data-works/PFE.csv", parse_dates=True, index_col=0)combined_df = corrected_df.merge(_pfe, on='Date',how='outer').dropna()

Then we create a column with the log returns of PFE.

combined_df['returns'] = np.log(combined_df['Close'] / combined_df['Close'].shift(1))

Describing our trading strategy logic. Here we will long (buy) the stock when our positive sentiment outweighs (is greater than) our negative sentiment. Then we create a column that shows our strategy returns that multiplies our position by the log returns. Note the shift(1) that ensures that we use the previous days sentiment rather than the current day to prevent hindsight bias.

Checking to see if our strategy outperforms the benchmark of simply holding the underlying instrument:

np.exp(combined_df[['returns','strategy']].sum())

The output shows our strategy outperforms the benchmark by a small margin (0.121776). Then we can visualize the performance of our strategy against the benchmark:

Performance Graph

In Summary, in this article we:

  • Used snscrape to pull tweets of a certain criteria (tweets including key word ‘Pfizer’) into a data frame
  • Applied VADER sentiment analysis tool to examine the sentiment of each tweet
  • Resampled our sentiment data into daily buckets
  • Built a simple trading strategy based on our sentiment data and compared its performance to the benchmark

Pandey, Parul. “Simplifying Sentiment Analysis Using Vader in Python (Using Social Media Test”. https://medium.com/analytics-vidhya/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f.

Hilpish, Yves. “Python for Finance: Mastering Data-Driven Finance”. O’Reilly 2014.

--

--

Graham Sahagian
The Startup

Learning and teaching how to do useful stuff, usually with code