International Journal of Computa

International Journal of Computational Linguistics & Chinese Language Processing [中��]
Vol. 24, No. 2, December 2019

A Feature-granularity Training Strategy for Chinese Spoken Question Answering
Shang-Bao Luo and Kuan-Yu Chen
[pdf | html]
EBSUM: An Enhanced BERT-based Extractive Summarization Framework
Zheng-Yu Wu and Kuan-Yu Chen
[pdf | html]
Deep Neural-Network Bandwidth Extension and Denoising Voice Conversion System for ALS Patients
Bai-Hong Huang, Yuan-Fu Liao, Guang-Feng Deng, Matúš Pleva and Daniel Hládek
[pdf | html]
Real-Time Mandarin Speech Synthesis System
An-Chieh Cheng and Chia-Ping Chen
[pdf | html]
Chatlog Disentanglement based on Similarity Evaluation Via Reply Message Pairs Prediction Task
ZhiXian Liu and Chia-Hui Chang
[pdf | html]

Title:
A Feature-granularity Training Strategy for Chinese Spoken Question Answering

Author:
Shang-Bao Luo and Kuan-Yu Chen

Abstract:
In spoken question answering, a segment of audio is usually converted into a textual representation through an automatic speech recognition (ASR) system, and then input to a text-based question answering model to generate the answer. However, based on the ASR transcriptions, which usually contain lots of recognition errors, text-based question answering system may produce imperfect results. In order to mitigate the performance gap, in this study, a featured-granularity training strategy is proposed. Accordingly, we evaluate the proposed training strategy on spoken Chinese machine reading comprehension task, which not only demonstrates the capability and ability of the proposed strategy, but several valuable observations can be drawn from the experimental results.

Keywords: Spoken Question Answering, Speech Recognition, Featured-granularity, Training Strategy

Title:
EBSUM: An Enhanced BERT-based Extractive Summarization Framework

Author:
Zheng-Yu Wu and Kuan-Yu Chen

Abstract:
Automatic summarization methods can be categorized into two major streams: the extractive summarization and the abstractive summarization. Although abstractive summarization is to generate a short paragraph for expressing the original document, but most of the generated summaries are hard to read. On the contrary, extractive summarization task is to extract sentences from the given document to construct a summary. Recently, BERT (Bidirectional encoder representation from transformers), which has been introduced to several NLP-related tasks and achieved remarkable results, is a pre-trained language representation method. In the context of extractive summarization, BERT is usually be used to obtain representations for sentences and documents, and then a simple model is employed to select potential summary sentences based on the inferred representations. In this paper, an enhanced BERT-based extractive summarization framework (EBSUM) is proposed. The major innovations are: first, EBSUM takes the sentence position information into account; second, in order to maximize the ROUGE score, the model is trained by the reinforcement learning strategy; third, to avoid the redundancy information, the maximal marginal relevance (MMR) criterion is incorporated with the proposed EBSUM model. In the experiments, EBSUM can outperforms several state-of-the-art models on the CNN/DailyMail corpus.

Keywords:
Auto-summarization, Extractive, BERT, Reinforcement Learning, MMR

Title:
Deep Neural-Network Bandwidth Extension and Denoising Voice Conversion System for ALS Patients

Author:
Bai-Hong Huang, Yuan-Fu Liao, Guang-Feng Deng, Matúš Pleva and Daniel Hládek

Abstract:
ALS (Amyotrophic lateral sclerosis) is a neurodegenerative disease. There is no cure for this disease, and it will make the ALS patients eventually lose their ability to use their own voice to communicate with others. Therefore, a personalized voice output communication aids (VOCAs) is essential for ALS patients to improve their daily life. However, most of the ALS patients have not properly reserved their personal recordings in the early stage of the disease. Usually, only few low-quality speech recordings, such as distortion compressed, narrow band (8 kHz), or noisy speech, are available for developing their own personalized VOCAs. In order to reconstruct high-quality synthetic sounds close to the original sound of ALS patients, voice conversion with speech denoising and bandwidth expansion capacities were proposed in this paper. Here, a front-end WaveNet- and a backend U-Net-based speech enhancement and super-resolution neural networks, respectively, were constructed and integrated with the backbone voice conversion system. The experimental results showed that the WaveNet and U-Net models can restore the noisy and narrowband speech, respectively. Therefore, it is promising to be applied to reconstruct high-quality personalized VOCAs for ALS patients.

Keywords:
Mandarin Neural network, ALS, WaveNet

Title:
Real-Time Mandarin Speech Synthesis System

Author:
An-Chieh Cheng and Chia-Ping Chen

Abstract:
This thesis studies and implements the real time Chinese speech synthesis system. This system uses a conversion model of the text sequence to the Mel spectrum sequence, and then concatenates a vocoder from the Mel spectrum to the synthesized speech. We use Tacotron2 to implement a sequence-to-sequence conversion model with several different vocoders, including Griffin-Lim, World-Vocoder, and WaveGlow. The WaveGlow neural network vocoder, which implements the reversible codec function, is the most prominent, and is impressive in terms of synthesis speed or speech quality. We use a single speaker with 12-hour corpus implementation system. In terms of voice quality, the MOS of the synthesized system voice using the WaveGlow vocoder is 4.08, which is slightly lower than the 4.41 of the real voice, and far better than the other two vocoders (average 2.93). In terms of processing speed, if the GeForce RTX 2080 TI GPU is used, the synthesis system using the WaveGlow vocoder produces a voice of 10 seconds and 48 kHz in 1.4 seconds, so it is a real time system.

Keywords:
TTS,Tacotron2,WaveGlow

Title:
Chatlog Disentanglement based on Similarity Evaluation Via Reply Message Pairs Prediction Task

Author:
ZhiXian Liu and Chia-Hui Chang

Abstract:
To build a Retrieval-based dialog system, we can exploit conversation log to extract question-answer pairs. However, the question-answer pairs are hidden in the conversation log, interleaving each other. The conversation task that separates different sub-topics from the interspersed messages is called conversation disentanglement. In this paper, we examined the task of judging whether two Reddit messages belong to the same topic dialogue and found that the performance is worse if training and testing data are splitted by time. In practice, it is also a very hard task even for human beings as there are only two messages and no context. However, if our goal is to predict whether a message is a reply to the other, the problem becomes much easier to judge. By changing the way of data preparation, we are able to achieve better performance through DA-LSTM (Dual Attention LSTM) and BERT-based models in the newly defined Reply prediction task.

Keywords:
Chatlog Disentanglement, Reply Relation Prediction, BERT Neural Model