Empirical Evaluation of Public Hate Speech Dataset

30 Pages Posted: 8 Jul 2023

See all articles by Sardar Jaf

Sardar Jaf

University of Sunderland

Basel Barakat

University of Sunderland

Abstract

Despite the extensive communication benefits social media platforms provide, there are many challenges to be addressed to make them safe for users. One of the most prevailing risk many users are facing on social media platforms is targeted hatespeech. Many methods and resources have been developed to address this problem. Recent advanced approaches involve machine learning algorithms trained and evaluated on annotated dataset to automatically detect or classify hatespeech. However, current public dataset have many limitations, which prevent machine learning algorithms to learn sufficiently from the information and accurately classify hatespeech.  In this study we provide a comprehensive empirical evaluation of numerous public datasets commonly employed in automated hate speech classification. Through rigorous empirical analysis, we present compelling evidence that sheds light on the limitations inherent in the current hate speech datasets used in supervised hate speech classification tasks. Furthermore, we offer a range of statistical analyses to elucidate the weaknesses and strengths inherent in these datasets.

Keywords: hatespeech classification, hatespeech evaluation, dataset evaluation, empirical dataset evaluation

Suggested Citation

Jaf, Sardar and Barakat, Basel, Empirical Evaluation of Public Hate Speech Dataset. Available at SSRN: https://ssrn.com/abstract=4504059 or http://dx.doi.org/10.2139/ssrn.4504059

Sardar Jaf (Contact Author)

University of Sunderland ( email )

Chester Road
Sunderland SR2 7PS
United Kingdom

Basel Barakat

University of Sunderland ( email )

Chester Road
Sunderland SR2 7PS
United Kingdom

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
58
Abstract Views
162
Rank
672,866
PlumX Metrics