International Journal of Computational Linguistics & Chinese Language Processing                                   []
                                                                                          Vol. 28, No. 1, June 2023



Title:
Intent Detection and Recognition of Long Sentences in Complex Environment Based on Speech Self-supervised Model >

Author:
Kai Zhang, Tzu-Chun Yeh, Chung-Che Wang, Qiuxia Zhang, WeiRen Lan, and Jyh-Shing Roger Jang

Abstract:
According to the characteristics of long sentences and lots of noise in the radio corpus of the fire command record center, this paper proposes a method to identify intent in long sentences. This method first using a self-supervised learning model for speech feature extraction, and then using two downstream models to detect and recognize intent in speech respectively. Compared with the Whisper+BERT method, this method has an error reduction rate (ERR) of 33.2% in the long speech intent recognition task of radio corpus. Compared with the region proposal network (RPN) method on the keyword spotting task, the false alarm per hour (FAH) is similar, and the false rejection rate (false rejection rate, FRR) ERR is 73.2%. Compared with the Whisper+BERT method on the short sentence classification task, the ERR is 4.3%. At the same time, compared with the Whisper+BERT method, the inference computing power requirement has dropped by 91.4%. This method can be widely used in the fields of extracting key information from long speech, recording key information of telephone or radio communication and so on.

Keywords: Intent Classification, Long Speech Sentence, Self-supervised Learning, Keyword Spot-ting, Speech Sentence Classification


Title:
A Dialogue Collection System for One-to-One Multi-Domain Task Oriented Dialogs

Author:
Cheng-Hung Yeh, Yu-Kai Lee and Chia-Hui Chang

Abstract:
Task-oriented dialog systems require labeled corpus for model training. However, in the face of new services, how to effectively collect dialogue corpus is a problem that must be faced in the construction of dialogue systems. Existing task-oriented systems mainly focus on reservations for restaurants, hotels, and airline tickets. There is no dialogue corpus for virtual assistants that could provide transactional services such as sending messages and creating events. This paper imitates the method of collecting dialogue datasets from CrossWOZ to allow annotators to simulate user and virtual assistant dialogue scenarios through a dialogue website interface, creating a dialogue dataset that can handle three services: email management, calendar management, and message delivery. It is expected that this corpus will lay the foundation for the development of Chinese virtual assistant dialogue system. The annotation system and dataset have been open-sourced at https://github.com/TedYeh/messageWOZ

Keywords:
Task-orient Dialogue Systems (TOD), Dialog Corpus Construction, Wizard-of-Oz (WOZ), Msg-WOZ


Title:
Taiwanese-Mandarin Neural Machine Translation

Author:
Ting-Hsuan Chou and Chuan-Jie Lin

Abstract:
This paper proposes a Taiwanese-Mandarin neural machine translation system trained by all available Taiwanese corpora and translation datasets. The unit of translation can be either Chinese words or characters. Embedding will be either self-trained or pre-trained. And some strategies will be proposed to handle OOV output. The final Mandarin-to-Taiwanese outperforms all known Taiwanese MT systems. The best BLEU score evaluated on news articles is 75.02, and the best score on literature articles is 38.13. We also built the first Taiwanese-to-Mandarin NMT system in the world, which achieves BLEU scores of 73.38 and 35.32 on those two genres of articles.

Keywords:
Taiwanese, Machine Translation, Neural Network, Embedding Model


Title:
A Two-Stage Learning Strategy for Fair Speech Emotion Recognition

Author:
Woan-Shiuan Chien, and Chi-Chun Lee

Abstract:
Speech Emotion Recognition (SER) is a key technology within the myriad of speech solutions. A unique fairness issue in SER stems from the inherent emotional perception bias present in the data labels provided by raters. To enhance both recognition performance and fairness in SER, addressing rater bias is paramount. In this study, we propose a two-stage framework. In the first stage, we generate debiased representations using a fairness-constrained adversarial framework. Subsequently, in the second stage, following gender-wise perceptual learning, we empower users with the ability to toggle freely between specific gender-wise perceptions as needed. We utilize two significant fairness metrics to evaluate our results, demonstrating that our distributions and predictions across genders are fair. Further analysis indicates that our model effectively mitigates the influence of gender perspectives in the feature learning space.

Keywords:
Speech Emotion Recognition, Rater Bias, Fair Representation, Perceptual Fairness


<