Member-only story
Data Analysis of New Year’s Resolution using Python
Analyzing a public data set from Kaggle

February is near, and it’s the official month of failed New Year’s Resolution. With that being said, this is a good season to analyze what we can learn from a New Year’s Resolution data set.
What we will be using here is from Kaggle. This is the link for downloading the data set. According to the uploader of the file, the data set contains 5011 tweets regarding 2015 New Year’s Resolutions. Taking a peek from the mined data, you’ll get the general idea that the tweeter users are from the United States.
In this real world practical exercise, we will tackle the following:
- Turn a csv file into a data frame.
- Troubleshoot some simple errors.
- Check for missing values.
- GroupBy data based on different features.
- Find the highest number of New Year’s resolution based on different factors.
- Get the top tweets from highest to lowest and vice versa.
- Create a line plot.
- Create a bar plot.
Important: I used jupyter notebook here to run the code. The tutorial assumes that you know how to code in python. In case it’s your first time, this link can help you build your python foundation.
Importing the libraries
The needed libraries are the following:
- Pandas for manipulation and analysis.
- Matplotlib and Seaborn for data visualization.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns%matplotlib inline
Loading the data set
We will use read_csv function to read the csv file and store it in a variable named df.
df = pd.read_csv('new_year_resolutions_dataset.csv')
I got an error this early while opening the csv file.
ParserError: Error tokenizing data. C error: Expected 4 fields in line 18, saw 5