What NLP Library You Should Use for Your Sentimental Analysis Project

A cross-section of sentiment analysis tools within popular NLP libraries for Python

Graham Sahagian
Geek Culture

--

With so many Natural Language Processing (NLP) libraries, it can be difficult to choose one or be confident that your selection fits your purposes best. This article aims to take a look specifically at the sentiment analysis tools of the most popular NLP libraries to provide a practical guide in choosing the right tool for your next sentiment analysis project with Python.

By the end of this article, you’ll have the answers to the following questions:

What’s the best sentiment analysis tool for social media? What library should I use for multi-lingual sentiment analysis? Which library is best if I want to train my own model?

In order to evaluate these libraries without (much) prejudice, I set out to perform a simple sentiment analysis task on two short paragraph texts. With each NLP library, I wrote a function to dissect an excerpt from an overwhelmingly negative article and an overwhelmingly positive article and return a sentiment score for each sentence in the sample texts.

The negative and positive sample texts are taken from Pitchfork album review articles. The positive text is taken from a 10/10 review of the album Music Has the Right to Children by Boards of Canada, one of the most beloved electronic albums of the modern era. The negative text is drawn from an incredibly harsh review of Kid Cudi’s indie rock album Speedin’ Bullet to Heaven, infamous for its mediocrity (a 4/10, for the record).

Overview

For each of the five NLP libraries selected, I wrote a function to (1) tokenize a given paragraph text and (2) generate sentiment polarity for each sentence (positive or negative sentiment) and the respective index (which sentence is being referenced). Then (3) combine the results and compare. Our survey includes the following 5 NLP libraries for Python and is by no means comprehensive:

  1. TextBlob
  2. Stanza
  3. VADER (via NLTK)
  4. Pattern
  5. Flair

Imports

import time  # optional, to time the function calls
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import stanza
import pandas as pd
from nltk import tokenize
from textblob import TextBlob
from pattern.en import sentiment, Sentence
import flair
loading sample texts from Pitchfork.com

1. TextBlob

One of the easier-to-use Sentimental Analysis tools, the TextBlob sentiment object returns a named tuple with subjectivity and polarity of the given text. For this tutorial, I’m only referencing the polarity. As a side note, I’m using the TextBlob library not the TextBlob via spaCy library. I’m assuming they would give similar results.

Install TextBlob with pip:

$ pip install -U textblob
$ python -m textblob.download_corpora

2. Stanza (via CoreNLP)

Stanford’s CoreNLP client, Stanza, is an effective tool for labeling positive or negative sentiment in 66 different languages however if you’re looking for higher granularity, then this library is not for you. Its .sentiment method returns a value of 0, 1, 2 representing negative, neutral, and positive sentiment, respectively. There is no “slightly negative” sentiment or “overly positive” sentiment rating as the return values are discrete classifications.

Stanza can be installed via:

$ pip install stanza

3. VADER (Valence Aware Dictionary sEntiment Reasoner) via NLTK

One of the best libraries included in this list, VADER is specifically attuned to social media text. It’s even trained to decode the sentiment of emojis. This makes it incredibly effective at deciphering and properly labeling content from the largest and most verbose platforms being used today: Twitter and Facebook. As far as I know, VADER is not supported for usage on non-English text.

Installing VADER:

$ pip install vader-sentiment

4. Pattern

Pattern is a text data-mining tool (web crawling, parsing, etc.) that has a text-processing component. Its sentiment() function returns a tuple with the polarity and subjectivity of a given text, respectively, where polarity is a value between -1.0 and +1.0 and subjectivity is a value between 0.0 and 1.0.

It can be installed with:

$ pip install pattern

5. Flair

Built on PyTorch, Flair is another powerful NLP library that features named entity recognition (NER), point of speech (POS) tagging, and special support for biomedical data. It’s highly configurable and provides support to train your own text analysis models.

Installing Flair:

$ pip install flair

It is, however, slightly less accessible out-of-the-box. The first function senti_score is required to return a polarity score. This is a quick step but one that is not required of the other sentiment analysis tools.

Sentiment Analyzer Result Comparison

The first thing I noticed from the results is that TextBlob and Pattern use the same sentiment analysis algorithm — as they gave identical scores for every sentence in both texts. I had to go back into my function to triple-check that I hadn’t used the same function/method for both of them. This is somewhat unsurprising as TextBlob was built after Pattern and NLTK and likely drew a lot from them.

I am a bit surprised that Pattern and TextBlob’s sentiment analyzers are completely identical though.

Figure 1: negative sentiment table

As discussed earlier, Stanza only gives a neg, neu, or positive (0, 1, 2) rating for each sentence; Flair also seems to be returning a similar picture — either positive or negative. Both are accurate and seem to offer the same level of precision, despite Flair generating a continuous, rather than discrete, score.

Figure 2: positive sentiment table

From these cursory tests, VADER pulls ahead in its accuracy, precision, ease of use.

Speed Comparison

I tacked a timer decorator onto each function call to get a rough picture of their speeds. The printed speed results below show clearly that TextBlob, VADER, and Pattern were the fastest with VADER being the most consistent.

func:'textblob_analyzer' took: 0.0602 sec
func:'textblob_analyzer' took: 0.0109 sec
func:'stanza_analyzer' took: 0.4384 sec
func:'stanza_analyzer' took: 0.3161 sec
func:'vader_analyzer' took: 0.0419 sec
func:'vader_analyzer' took: 0.0201 sec
func:'pattern_analyzer' took: 0.0534 sec
func:'pattern_analyzer' took: 0.0044 sec
func:'flair_analyzer' took: 0.5243 sec
func:'flair_analyzer' took: 0.3798 sec

Key Takeaways

Overall, VADER reigns supreme (IMO). Due to the fact that it’s tuned specifically to handle social media text, it produced the most accurate and precise sentiment scoring on our sample texts and would likely give similarly promising results on any (English) text in a social media context. If you’re looking for a sentiment analysis library to analyze modern colloquial texts, VADER will most likely get the job done best.

The other 4 (really 3, as Pattern and TextBlob are the same in regard to their sentiment analyzer) have their uses though. Choose Flair if you’re looking to analyze biomedical data, or if you want to train a unique model on a custom corpus. Choose Pattern if you want to detect comparatives versus superlatives and/or fact versus opinion. Choose Stanza if you’re looking to analyze sentiment in multiple languages and/or want a discretized sentiment output. Choose TextBlob because it’s fast and easy and will give you the same sentiment results as Pattern.

TLDR;

Choose VADER if you want to analyze emoji sentiment and funny memes on the internet(actual use-case).

--

--

Graham Sahagian
Geek Culture

Learning and teaching how to do useful stuff, usually with code