검색 상세

Explainable Hate Speech Detection through Masked Rationale Prediction

초록/요약

Hate speech detection is important in that the spread of hate speech strengthens critical social discrimination against its target social group not only online but also in the real world. We propose Masked Rationale Prediction (MRP) to improve the performance of hate speech detection considering two important aspects—the model bias and explainability. Understanding the context of hate speech is important for hate speech detection. Hate speech cannot be identified based solely on the presence of specific words considered hateful. However, existing models are easily biased on the specific expressions and make wrong detection results. Even though they correctly predict, the model rationale is often not explained in a convincing manner. Thus, to implement a hate speech detection model, bias and explainability should be considered. MRP is a task to predict the masked human rationales—snippets of a sentence that are grounds for human judgment—by referring to surrounding tokens combined with their unmasked rationales. the human rationales are randomly masked and inputted into the model by being combined with each of the tokens. We pre-finetune a pre-trained model on MRP as an intermediate task and then finetune on hate speech detection. As the model learns its reasoning ability based on rationales by MRP, it performs hate speech detection robustly in terms of bias and explainability. The proposed method generally achieves state-of-the-art performance in various metrics, demonstrating its effectiveness for hate speech detection.

more

목차

제1장 Introduction 1
제2장 Related Works 5
제1절 Hate Speech Detection 5
제2절 Pre-finetuning on an intermediate task 5
제3절 Explainable NLP and rationale 6
제3장 Method 7
제1절 Task 7
제2절 Masked rationale prediction 8
제3절 Hate speech detection 10
제4장 Experiments 11
제1절 Dataset 11
제2절 Metrics 12
제3절 Models and Experimental settings 14
제4절 Comparisons of results 15
제5절 Qualitative results 19
제5장 Conclusion 21
제6장 References 22

more