An ensemble approach for the identification and classification  of crime tweets in the English language

Tooba Siddiqui; Saman Hina; Raheela Asif; Saad Ahmed; Munad Ahmed

doi:10.11591/csit.v4i2.pp149-159

Authors

Tooba Siddiqui NED University of Engineering and Technology https://orcid.org/0009-0001-4917-8155
Saman Hina NED University of Engineering and Technology https://orcid.org/0000-0002-7649-4372
Raheela Asif NED University of Engineering & Technology https://orcid.org/0000-0001-5155-8205
Saad Ahmed IQRA University https://orcid.org/0000-0001-6121-8124
Munad Ahmed MSN360.pk https://orcid.org/0009-0004-9617-8758

DOI:

https://doi.org/10.11591/csit.v4i2.pp149-159

Keywords:

Classification, Crime tweets, Ensemble approach, Natural language processing, Twitter

Abstract

Twitter is a famous social media platform, which supports short posts limited to 280 characters. Users tweet about many topics like movie reviews, customer service, meals they just ate, and awareness posts. Tweets carrying information about some crime scenes are crime tweets. Crime tweets are crucial and informative and separate classification is required. Identification and classification of crime tweets is a challenging task and has been the researcher’s latest interest. The researchers used different approaches to identify and classify crime tweets. This research has used an ensemble approach for the identification and classification of crime tweets. Tweepy and Twint libraries were used to collect datasets from Twitter. Both libraries use contrasting methods for extracting tweets from Twitter. This research has applied many ensemble approaches for the identification and classification of crime tweets. Logistic regression (LR), support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), and random forest (RF) Classifier assigned with the weights of 1,2,1,1 and 1 respectively ensemble together by a soft weighted Voting classifier along with term frequency – inverse document frequency (TF-IDF) vectorizer gives the best performance with an accuracy of 96.2% on the testing dataset.

An ensemble approach for the identification and classification of crime tweets in the English language

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Submissions

Article Template

Make a Submission

Menu Bar

People

Quick Links

Information

About The Journal

Journal Policies

Author

Information