A Deep Learning Framework for Sarcasm Detection through Sentiment Analysis of Spoken Audio Data

Main Article Content

Reetu Awasthi, Vinay Chavan

Abstract

This is since sarcasm is a special challenge to sentiment-analysis pipelines, given that the surface polarity of this discourse type is very often opposite to the actual intent of the speaker. it may be able to skew dashboards of customer-satisfaction, folder up speak-to-bots, and biases subsequently analytics. The present Research presents an ultra-light workflow in sarcasm-detection that does not require any heavy acoustic work but is based entirely on speech transcriptions. The raw audio is transcribed initially by using an automated-speech-recognition (ASR) commercial service; the text thus obtained is then vectorised with the character 3-to-5-gram TF-IDF features, which is resistant to creative spellings, punctuation, and ASR noise. The binary classification is done by using a balanced linear support-vector classifier (Linear-SVC) and then probability-calibrated by Platt scaling. We test the system on a 25-utterance pilot-corpus of workplace English, 13 non-sarcastic and 12 sarcasms. A 80 / 20 stratified split results in 20 training and five test clips. The model has a final accuracy of 80 % and AUC of 0.83, in the deprived data, however, it can pick out all non-sarcastic cases, and miss none of sarcastic ones. Experts can use them to show the system behaviour in detail; novices will be able to intuitively understand the nature of system behaviour: we provide six short visual diagnostics: (1) class-distribution bar chart, (2-3) word-frequency clouds per class, (4) confusion-matrix heat-map, (5) a row of predicted sarcasm-probability bars, and (6) an ROC curve. The results show that simple lexical cues coupled with a calibrated linear margin have already learned a great number of the cues to sarcasm in spontaneous spoken English. The framework provides a feasible starting point on sarcasm-detection in organisations requirement prompt sarcasm-detection in resource-efficient volumes of content, prior to affording installing a sizeable audio sentiment-architecture or deep multimodal representations.

Article Details

Issue
Section
Articles