International Journal of Computational Linguistics & Chinese
Language Processing
Vol.
27, No. 2, December 2022
Title:
Aligning Sentences in a Paragraph-Paraphrased Corpus with New Embedding-based
Similarity Measures
Author:
Aleksandra Smolka, Hsin-Min Wang, Jason S. Chang, and Keh-Yih Su
Abstract:
To better understand and utilize lexical and syntactic mapping between various
language expressions, it is often first necessary to perform sentence alignment
on the provided data. Up until now, the character trigram overlapping ratio was considered to be the best similarity measure on the text
simplification corpus. In this paper, we aim to show that a newer
embedding-based similarity metric will be preferable to the traditional SOTA
metric on the paragraph-paraphrased corpus. We report a series of experiments
designed to compare different alignment search strategies as well as various
embedding- and non-embedding-based sentence similarity metrics in the
paraphrased sentence alignment task. Additionally, we explore the problem of
aligning and extracting sentences with imposed restrictions, such as
controlling sentence complexity. For evaluation, we use paragraph pairs sampled
from the Webis-CPC-11 corpus containing paraphrased paragraphs. Our results
indicate that modern embedding-based metrics such as those utilizing SentenceBERT or BERTScore
significantly outperform the character trigram overlapping ratio in the
sentence alignment task in the paragraph-paraphrased corpus.
Keywords: Sentence Alignment,
Sentence Similarity, Sentence Embedding, Paragraph-paraphrased Corpus
Title:
Investigation of Feature Processing Modules and Attention Mechanisms in Speaker
Verification System
Author:
Ting-Wei Chen, Wei-Ting Lin, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan, Yu-Han
Cheng, Hsiang-Feng Chuang, Wei-Yu Chen
Abstract:
In this paper, we use several combinations of feature front-end modules and
attention mechanisms to improve the performance of our speaker verification
system. An updated version of ECAPA-TDNN is chosen as a baseline. We replace
and integrate different feature front-end and attention mechanism modules to
compare and find the most effective model design, and this model would be our
final system. We use VoxCeleb 2 dataset as our
training set, and test the performance of our models
on several test sets. With our final proposed model, we improved performance by
16% over baseline on VoxSRC2022 valudation set,
achieving better results for our speaker verification system.
Keywords:
Speaker
Verification, Frontend Module, Attention Mechanism, Time Delay Neural Network
Title:
Development of Mandarin-English Code-switching Speech Synthesis System
Author:
Hsin-Jou Lien, Li-Yu Huang, and Chia-Ping Chen
Abstract:
In this paper, the Mandarin-English codeswitching speech synthesis system has
been proposed. To focus on learning the content information between two
languages, the training dataset is multilingual artificial dataset whose
speaker style is unified. Adding language embedding into the system helps it be
more adaptive to multilingual dataset. Besides, text preprocessing is applied
and be used in different way which depends on the languages. Word segmentation
and text-to-pinyin are the text preprocessing for Mandarin, which not only
improves the fluency but also reduces the learning complexity. Number
normalization decides whether the arabic numerals in
sentence needs to add the digits. The preprocessing for English is acronym
conversion which decides the pronunciation of acronym.
Keywords:
Speech Synthesize, Codeswitching, Text Preprocessing
Title:
Analyzing Discourse Functions with Acoustic Features and Phone Embeddings:
Non-lexical Items in Taiwan Mandarin
Author:
Pin-Er Chen, Yu-Hsiang Tseng, Chi-Wei Wang, Fang-Chi Yeh, and Shu-Kai Hsieh
Abstract:
Non-lexical items are expressive devices used in
conversations that are not words but are nevertheless meaningful. These items
play crucial roles, such as signaling, turn-taking, or marking stances in
interactions. However, as the non-lexical items do not stably correspond to
written or phonological forms, past studies tend to focus on studying their
acoustic properties, such as pitches and durations. In this paper, we
investigate the discourse functions of non-lexical items through their acoustic
properties and the phone embeddings extracted from a deep learning model.
Firstly, we create a non-lexical item dataset based on the interpellation video
clips from Taiwan�䏭 Legislative Yuan. Then, we manually identify the
non-lexical items and their discourse functions in the videos. Next, we analyze
the acoustic properties of those items through statistical modeling and
building classifiers based on phone embeddings extracted from a phone
recognition model. We show that (1) the discourse functions have significant
effects on the acoustic features; and (2) the classifiers built on phone
embeddings perform better than the ones on conventional acoustic properties.
These results suggest that phone embeddings may reflect the phonetic variations
crucial in differentiating the discourse functions of non-lexical items.
Keywords:
Non-lexical Item, Discourse Function, Acoustic Property, Acoustic
Representation, Pragmatics
Title:
A Chinese Dimensional Valence-Arousal-Irony Detection on Sentence-level and
Context-level Using Deep Learning Model
Author:
Jheng-Long Wu, Sheng-Wei Huang, Wei-Yi Chung, Yu-Hsuan
Wu,
and Chen-Chia Yu
Abstract:
Chinese multi-dimensional sentiment detection task is a big challenge with a
great influence on semantic understanding. Irony is one of the sentiment
analysis and the datasets established in the previous studies usually determine
whether a sentence belongs to irony and its intensity. However, the lack of
other sentimental features makes this kind of datasets very limited in many
applications. Irony has a humorous effect in dialogues, useful sentimental
features should be considered while constructing the dataset. Ironic sentences
can be defined as sentences in which the true meaning is the opposite of the
literal meaning. To understand the true meaning of a ironic sentence, the contextual information is
needed. In summary, a dataset that includes dimensional sentiment intensities
and context of ironic sentences allows researchers to better understand ironic
sentences. The paper creates an extended NTU irony corpus, which includes
valence, arousal and irony intensities on the
sentence-level; and valence and arousal intensities on the context-level, which
called the Chinese Dimensional Valence-Arousal-Irony (CDVAI) dataset. The paper
analyzes the difference of CDVAI annotation results between annotators,
and uses a lot of deep learning models to evaluate the prediction
performances of CDVAI dataset.
Keywords:
Irony Annotation, Dimensional Valence-Arousal-Irony, Sentiment Analysis, Deep
Learning
Title:
Taiwanese Voice Conversion based on Cascade ASR and TTS Framework
Author:
Wen-Han Hsu, Yuan-Fu Liao, Wern-Jun Wang, and
Chen-Ming Pan
Abstract:
Taiwanese has been listed as an endangered language by the United Nations and
is urgent for passing on. Therefore, this study wants to find out how to make a
Taiwanese speech synthesis system that can synthesize any Taiwanese sentences
via anyone's voice. To achieve this goal, we first (1) built a large-scale
Taiwanese Across Taiwan (TAT) corpus, with in total of 204 speakers and about
140 hours of speech. Among those speakers, two men and women, each one has
especially about 10 hours of speech recorded for the purpose of speech
synthesis, then (2) establish a Chinese Text-to-Taiwanese speech synthesis
system based on the Tacotron2 speech synthesis architecture, plus with a
frontend sequence-to-sequence-based Chinese characters to Taiwan Minnanyu Luomazi Pinyin
(shortened as Tâi-lô) machine translation module and
the backend WaveGlow real-time speech generator, and
finally, (3) constructed a Taiwanese voice conversion system based on the
concatenated speech recognition and speech synthesis framework where two voice
conversion functions had been implemented including (1) same-language:
Taiwanese to Taiwanese voice conversion, and (2) multi-language: Chinese to
Taiwanese voice conversion. In order to evaluate the
Taiwanese voice conversion system, we publically
recruited 29 subjects from the Internet to conduct two kinds of scoring task:
same-language and cross-language voice conversion and carried out the
subjective "naturalness" and "similarity" mean opinion
score (MOS) evaluations respectively. The test result shows that in the
Intra-lingual session, the average naturalness MOS is 3.45, 3.02 and 2.23
points, and average similarity MOS score�䏭 3.38, 2.99 and 2.10 points while
using 10 minutes, 3 minutes, and 30 seconds target speech, respectively; in
cross-lingual part, the average naturalness MOS score is 2.90 and 2.70 points; average
similarity MOS score is 2.84 and 2.54 points while using 6 minutes and 3
minutes target speech, respectively. From those results, it shows that our
proposed system indeed could synthesize any Taiwanese sentences via anyone's
voice.
Keywords:
Taiwanese Across Taiwan, Taiwanese Speech Synthesis, Taiwanese Voice Conversion
��
��