This project is for the CS651 final project.
We were given a task to apply the big data tools that were learnt in the class combined with some machine learning models to solve a real world problem.
We decided to predict the winner of US presidential elections 2020 out of the two contestants(Trump,Biden).
As it is a binary classification problem, we used several binary classifiers such as Logistic Regression, Naive Bayes Model for prediction. We also applied sentimental analysis and figured out from the tweets who’ll win the presidential elections.
- sklearn
- pandas
- numpy
- findspark
- re
- subprocess
- sys
- pyspark
- tweepy
- socket
Go ahead and pip install above packages.
pip install pyqt5
pip install sklearn
pip install pandas
pip install numpy
pip install dbConnect
pip install sklearn
pip install pandas
pip install numpy
pip install findspark
pip install re
pip install subprocess
pip install sys
pip install pyspark
pip install tweepy
pip installsocket
Open two terminal if on MAC OS, if on Windows open Command Prompt or in UNIX open Command Prompt/Shell.
In one terminal Run
python stream_producer.py
In the other terminal Run
python stream_consumer.py
The data source was the tweets and they were collected in real-time from the twitter. As the data was huge we use a Big Data framework Spark Streaming to fetch the tweets in real time. We assigned a time window of 15 minutes for gathering the tweets.
In this step we look for any discrepancies in the dataset. As the tweets can contain unnecassary things , it should be cleaned and processed before passing it to a model.
As our problem dealt with text, we had only a single text feature, we didn’t need to remove any other features. The feature engineering we applied was to
remove emojis, unwanted text, explain here more.
Our problem was a binary classification, so we used Logistic Regression & Naive Bayes Model.
Out of these two, Naive Bayes Model preformed better.
As this was a school project, we weren’t asked to deploy our model somewhere. If asked, we would’ve done that using Flask framework.
Leave a Reply