International Journal of Computational Linguistics & Chinese Language Processing                                  
                                                                                        Vol. 27, No. 2, December 2022



Title:
Aligning Sentences in a Paragraph-Paraphrased Corpus with New Embedding-based Similarity Measures

Author:
Aleksandra Smolka, Hsin-Min Wang,  Jason S. Chang, and Keh-Yih Su

Abstract:
To better understand and utilize lexical and syntactic mapping between various language expressions, it is often first necessary to perform sentence alignment on the provided data. Up until now, the character trigram overlapping ratio was considered to be the best similarity measure on the text simplification corpus. In this paper, we aim to show that a newer embedding-based similarity metric will be preferable to the traditional SOTA metric on the paragraph-paraphrased corpus. We report a series of experiments designed to compare different alignment search strategies as well as various embedding- and non-embedding-based sentence similarity metrics in the paraphrased sentence alignment task. Additionally, we explore the problem of aligning and extracting sentences with imposed restrictions, such as controlling sentence complexity. For evaluation, we use paragraph pairs sampled from the Webis-CPC-11 corpus containing paraphrased paragraphs. Our results indicate that modern embedding-based metrics such as those utilizing SentenceBERT or BERTScore significantly outperform the character trigram overlapping ratio in the sentence alignment task in the paragraph-paraphrased corpus.

Keywords: Sentence Alignment, Sentence Similarity, Sentence Embedding, Paragraph-paraphrased Corpus


Title:
Investigation of Feature Processing Modules and Attention Mechanisms in Speaker Verification System

Author:
Ting-Wei Chen, Wei-Ting Lin, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan, Yu-Han Cheng, Hsiang-Feng Chuang, Wei-Yu Chen

Abstract:
In this paper, we use several combinations of feature front-end modules and attention mechanisms to improve the performance of our speaker verification system. An updated version of ECAPA-TDNN is chosen as a baseline. We replace and integrate different feature front-end and attention mechanism modules to compare and find the most effective model design, and this model would be our final system. We use VoxCeleb 2 dataset as our training set, and test the performance of our models on several test sets. With our final proposed model, we improved performance by 16% over baseline on VoxSRC2022 valudation set, achieving better results for our speaker verification system.

Keywords:
Speaker Verification, Frontend Module, Attention Mechanism, Time Delay Neural Network


Title:
Development of Mandarin-English Code-switching Speech Synthesis System

Author:
Hsin-Jou Lien, Li-Yu Huang, and Chia-Ping Chen

Abstract:
In this paper, the Mandarin-English codeswitching speech synthesis system has been proposed. To focus on learning the content information between two languages, the training dataset is multilingual artificial dataset whose speaker style is unified. Adding language embedding into the system helps it be more adaptive to multilingual dataset. Besides, text preprocessing is applied and be used in different way which depends on the languages. Word segmentation and text-to-pinyin are the text preprocessing for Mandarin, which not only improves the fluency but also reduces the learning complexity. Number normalization decides whether the arabic numerals in sentence needs to add the digits. The preprocessing for English is acronym conversion which decides the pronunciation of acronym.

Keywords:
Speech Synthesize, Codeswitching, Text Preprocessing


Title:
Analyzing Discourse Functions with Acoustic Features and Phone Embeddings: Non-lexical Items in Taiwan Mandarin

Author:
Pin-Er Chen, Yu-Hsiang Tseng, Chi-Wei Wang, Fang-Chi Yeh, and Shu-Kai Hsieh

Abstract:
Non-lexical items are expressive devices used in conversations that are not words but are nevertheless meaningful. These items play crucial roles, such as signaling, turn-taking, or marking stances in interactions. However, as the non-lexical items do not stably correspond to written or phonological forms, past studies tend to focus on studying their acoustic properties, such as pitches and durations. In this paper, we investigate the discourse functions of non-lexical items through their acoustic properties and the phone embeddings extracted from a deep learning model. Firstly, we create a non-lexical item dataset based on the interpellation video clips from Taiwan�䏭 Legislative Yuan. Then, we manually identify the non-lexical items and their discourse functions in the videos. Next, we analyze the acoustic properties of those items through statistical modeling and building classifiers based on phone embeddings extracted from a phone recognition model. We show that (1) the discourse functions have significant effects on the acoustic features; and (2) the classifiers built on phone embeddings perform better than the ones on conventional acoustic properties. These results suggest that phone embeddings may reflect the phonetic variations crucial in differentiating the discourse functions of non-lexical items.

Keywords:
Non-lexical Item, Discourse Function, Acoustic Property, Acoustic Representation, Pragmatics


Title:
A Chinese Dimensional Valence-Arousal-Irony Detection on Sentence-level and Context-level Using Deep Learning Model

Author:
Jheng-Long Wu, Sheng-Wei Huang, Wei-Yi Chung, Yu-Hsuan Wu,
and Chen-Chia Yu

Abstract:
Chinese multi-dimensional sentiment detection task is a big challenge with a great influence on semantic understanding. Irony is one of the sentiment analysis and the datasets established in the previous studies usually determine whether a sentence belongs to irony and its intensity. However, the lack of other sentimental features makes this kind of datasets very limited in many applications. Irony has a humorous effect in dialogues, useful sentimental features should be considered while constructing the dataset. Ironic sentences can be defined as sentences in which the true meaning is the opposite of the literal meaning. To understand the true meaning of a ironic sentence, the contextual information is needed. In summary, a dataset that includes dimensional sentiment intensities and context of ironic sentences allows researchers to better understand ironic sentences. The paper creates an extended NTU irony corpus, which includes valence, arousal and irony intensities on the sentence-level; and valence and arousal intensities on the context-level, which called the Chinese Dimensional Valence-Arousal-Irony (CDVAI) dataset. The paper analyzes the difference of CDVAI annotation results between annotators, and uses a lot of deep learning models to evaluate the prediction performances of CDVAI dataset.

Keywords:
Irony Annotation, Dimensional Valence-Arousal-Irony, Sentiment Analysis, Deep Learning


Title:
Taiwanese Voice Conversion based on Cascade ASR and TTS Framework

Author:
Wen-Han Hsu, Yuan-Fu Liao, Wern-Jun Wang, and Chen-Ming Pan

Abstract:
Taiwanese has been listed as an endangered language by the United Nations and is urgent for passing on. Therefore, this study wants to find out how to make a Taiwanese speech synthesis system that can synthesize any Taiwanese sentences via anyone's voice. To achieve this goal, we first (1) built a large-scale Taiwanese Across Taiwan (TAT) corpus, with in total of 204 speakers and about 140 hours of speech. Among those speakers, two men and women, each one has especially about 10 hours of speech recorded for the purpose of speech synthesis, then (2) establish a Chinese Text-to-Taiwanese speech synthesis system based on the Tacotron2 speech synthesis architecture, plus with a frontend sequence-to-sequence-based Chinese characters to Taiwan Minnanyu Luomazi Pinyin (shortened as Tâi-lô) machine translation module and the backend WaveGlow real-time speech generator, and finally, (3) constructed a Taiwanese voice conversion system based on the concatenated speech recognition and speech synthesis framework where two voice conversion functions had been implemented including (1) same-language: Taiwanese to Taiwanese voice conversion, and (2) multi-language: Chinese to Taiwanese voice conversion. In order to evaluate the Taiwanese voice conversion system, we publically recruited 29 subjects from the Internet to conduct two kinds of scoring task: same-language and cross-language voice conversion and carried out the subjective "naturalness" and "similarity" mean opinion score (MOS) evaluations respectively. The test result shows that in the Intra-lingual session, the average naturalness MOS is 3.45, 3.02 and 2.23 points, and average similarity MOS score�䏭 3.38, 2.99 and 2.10 points while using 10 minutes, 3 minutes, and 30 seconds target speech, respectively; in cross-lingual part, the average naturalness MOS score is 2.90 and 2.70 points; average similarity MOS score is 2.84 and 2.54 points while using 6 minutes and 3 minutes target speech, respectively. From those results, it shows that our proposed system indeed could synthesize any Taiwanese sentences via anyone's voice.

Keywords:
Taiwanese Across Taiwan, Taiwanese Speech Synthesis, Taiwanese Voice Conversion


��

��