International Journal of Computational Linguistics & Chinese Language Processing                                   [中æ�]
                                                                                          Vol. 23, No. 2, December 2018



Title:
A Study on Mandarin Speech Recognition using Long Short- Term Memory Neural Network

Author:
Chien-hung Lai and Yih-Ru Wang

Abstract:
In recent years, neural networks have been widely used in the field of speech recognition. This paper uses the Recurrent Neural Network to train acoustic models and establish a Mandarin speech recognition system. Since the recursive neural networks are cyclic connections, the modeling of temporal signals is more beneficial than the full connected deep neural networks.
However, the recursive neural networks have the problem of gradient vanishing and gradient exploding in the backpropagation, which leads to the training being suspended. And the inability to effectively capture long-term memory associations, so Long Short-Term Memory (LSTM) is a model proposed to solve this problem. This study is based on this model architecture and combines convolutional neural networks and deep neural networks to construct the CLDNN models.

Keywords: RNNs, LSTMs, Gradient Vanishing (Exploding), Acoustic Model, Mandarin, LVCSR, CNNs, DNNs


Title:
Leveraging Discriminative Training and Model Combination for Semi-supervised Speech Recognition

Author:
Tien-Hong Lo and Berlin Chen

Abstract:
In recent years, the so-called lattice-free MMI (LF-MMI) criterion has been proposed with good success for supervised training of state-of-the-art acoustic models in various automatic speech recognition (ASR) applications. However, when moving to the scenario of semi-supervised acoustic model training, the seed models of LF-MMI are often show inadequate competence due to limited available manually labeled training data. This is because LF-MMI shares a common deficiency of discriminative training criteria, being sensitive to the accuracy of the corresponding transcripts of training utterances. This paper sets out to explore two novel extensions of semi-supervised training in conjunction with LF-MMI. First, we capitalize more fully on negative conditional entropy (NCE) weighting and utilize word lattices for supervision in the semi-supervised setting. The former aims to minimize the conditional entropy of a lattice, which is equivalent to a weighted average of all possible reference transcripts. The minimization of the lattice entropy is a natural extension of the MMI objective for modeling uncertainty. The latter one, utilizing word lattices for supervision, manages to preserve more cues in the hypothesis space, by using word lattices instead of one-best results, to increase the possibility of finding reference transcripts of training utterances. Second, we draw on the notion stemming from ensemble learning to develop two disparate combination methods, namely hypothesis-level combination and frame-level combination. In doing so, the error-correcting capability of the acoustic models can be enhanced. The experimental results on a meeting transcription task show that the addition of NCE weighting, as well as the utilization of word lattices for supervision, can significantly reduce the word error rate (WER) of the ASR system, while the model combination approaches can also considerably improve the performance at various stages. Finally, fusion of the aforementioned two kinds of extensions can achieve a word recovery rate (WRR) of 60.8%.

Keywords:
Automatic Speech Recognition, Discriminative Training, Semi-supervised Training, Model Combination, LF-MMI


Title:
Leveraging Discriminative Training and Improved Neural Network Architecture and Optimization Method

Author:
Wei-Cheng Chao, Hsiu-Jui Chang, Tien-Hong Lo, and Berlin Chen

Abstract:
This paper sets out to investigate the effect of acoustic modeling on Mandarin large vocabulary continuous speech recognition (LVCSR). In order to obtain more discriminative baseline acoustic models, we adopt the recently proposed lattice-free maximum mutual information (LF-MMI) criterion as the objective for sequential training of component neural networks in replace of the conventional cross entropy criterion. LF-MMI brings the benefit of efficient forward-backward statistics accumulation on top of the graphical processing unit (GPU) for all hypothesized word sequences without the need of an explicit word lattice generation process. Paired with LF-MMI, the component neural networks of acoustic models implemented with the so-called time-delay neural network (TDNN) often lead to impressive performance. In view of the above, we explore an integration of two novel extensions of acoustic modeling. One is to conduct semi-orthogonal low-rank matrix factorization on the TDNN-based acoustic models with deeper network layers to increase their robustness. The other is to integrate the backstitch mechanism into the update process of acoustic models for promoting the level of generalization. Extensive experiments carried out on a Mandarin broadcast news transcription task reveal that the integration of these two novel extensions of acoustic modeling can yield considerably improvements over the baseline LF-MMI in terms of character error rate (CER) reduction.

Keywords:
Mandarin Large Vocabulary Continuous Speech Recognition, Acoustic Model, Discriminative Training, Matrix Factorization, Backstitch.


Title:
Supporting Evidence Retrieval for Answering Yes/No Questions

Author:
Meng-Tse Wu, Yi-Chung Lin and Keh-Yih Su

Abstract:
This paper proposes a new n-gram matching approach for retrieving the supporting evidence, which is a question related text passage in the given document, for answering Yes/No questions. It locates the desired passage according to the question text with an efficient and simple n-gram matching algorithm. In comparison with those previous approaches, this model is more efficient and easy to implement. The proposed approach was tested on a task of answering Yes/No questions of Taiwan elementary school Social Studies lessons. Experimental results showed that the performance of our proposed approach is 5% higher than the well-known Apache Lucene search engine.

Keywords:
Supporting Evidence Retrieval, Q&A for Yes/No Questions


Title:
An OOV Word Embedding Framework for Chinese Machine Reading Comprehension

Author:
Shang-Bao Luo, Ching-Hsien Lee, Jia-Jang Tu and Kuan-Yu Chen

Abstract:
When using Deep Learning methods in NLP-related tasks, we usually represent a word by using a low-dimensional dense vector, which is named the word embedding, and these word embeddings can then be treated as feature vectors for various neural network-based models. However, a major challenge facing such a mechanism is how to represent OOV words. There are two common strategies in practiced: one is to remove these words directly; the other is to represent OOV words by using zero or random vectors. To mitigate the flaw, we introduce an OOV embedding framework, which aims at generating reasonable low-dimensional dense vectors for OOV words. Furthermore, in order to evaluate the impact of the OOV representations, we plug the proposed framework into the Chinese machine reading comprehension task, and a series of experiments and comparisons demonstrate the good efficacy of the proposed framework.

Keywords:
Natural Language Processing, Word Embedding, Out-of-vocabulary, Machine Reading Comprehension


Title:
Hierarchical Multi-Label Chinese Word Semantic Labeling using Deep Neural Network

Author:
Wei-Chieh Chou and Yih-Ru Wang

Abstract:
Traditionally, classifying over 100 hierarchical multi-labels could use flatten classification, but it will lose the taxonomy structure information. This paper aimed to classify the concept of word in E-HowNet and proposed a deep neural network training method with hierarchical relationship in E-HowNet taxonomy. The input of neural network is word embedding. About word embedding, this paper proposed order-aware 2-Bag Word2Vec. Experiment results shown hierarchical classification will achieved higher accuracy than flatten classification.

Keywords:
Word2Vec, Neural Network, Minimum Classification Error, E-HowNet, Hierarchical Classification, Multi-label Classification