International Journal of Computational Linguistics & Chinese Language
Processing
[䏿�]
Vol. 23, No. 2, December 2018
Title:
A Study on Mandarin Speech Recognition using Long Short- Term Memory Neural
Network
Author:
Chien-hung Lai and Yih-Ru Wang
Abstract:
In recent years, neural networks have been widely used in the field of speech recognition.
This paper uses the Recurrent Neural Network to train acoustic models and
establish a Mandarin speech recognition system. Since the recursive neural networks
are cyclic connections, the modeling of temporal signals is more beneficial
than the full connected deep neural networks.
However, the recursive neural networks have the problem of gradient vanishing and
gradient exploding in the backpropagation, which leads to the training being suspended.
And the inability to effectively capture long-term memory associations, so Long
Short-Term Memory (LSTM) is a model proposed to solve this problem. This study
is based on this model architecture and combines convolutional neural networks
and deep neural networks to construct the CLDNN models.
Keywords: RNNs, LSTMs, Gradient
Vanishing (Exploding), Acoustic Model, Mandarin, LVCSR, CNNs, DNNs
Title:
Leveraging Discriminative Training and Model Combination for Semi-supervised
Speech Recognition
Author:
Tien-Hong Lo and Berlin Chen
Abstract:
In recent years, the so-called lattice-free MMI (LF-MMI) criterion has been
proposed with good success for supervised training of state-of-the-art acoustic
models in various automatic speech recognition (ASR) applications. However,
when moving to the scenario of semi-supervised acoustic model training, the
seed models of LF-MMI are often show inadequate competence due to limited
available manually labeled training data. This is because LF-MMI shares a
common deficiency of discriminative training criteria, being sensitive to the
accuracy of the corresponding transcripts of training utterances. This paper
sets out to explore two novel extensions of semi-supervised training in
conjunction with LF-MMI. First, we capitalize more fully on negative
conditional entropy (NCE) weighting and utilize word lattices for supervision
in the semi-supervised setting. The former aims to minimize the conditional
entropy of a lattice, which is equivalent to a weighted average of all possible
reference transcripts. The minimization of the lattice entropy is a natural
extension of the MMI objective for modeling uncertainty. The latter one,
utilizing word lattices for supervision, manages to preserve more cues in the
hypothesis space, by using word lattices instead of one-best results, to
increase the possibility of finding reference transcripts of training
utterances. Second, we draw on the notion stemming from ensemble learning to
develop two disparate combination methods, namely hypothesis-level combination
and frame-level combination. In doing so, the error-correcting capability of
the acoustic models can be enhanced. The experimental results on a meeting
transcription task show that the addition of NCE weighting, as well as the
utilization of word lattices for supervision, can significantly reduce the word
error rate (WER) of the ASR system, while the model combination approaches can
also considerably improve the performance at various stages. Finally, fusion of
the aforementioned two kinds of extensions can achieve a word recovery rate
(WRR) of 60.8%.
Keywords:
Automatic
Speech Recognition, Discriminative Training, Semi-supervised Training, Model
Combination, LF-MMI
Title:
Leveraging Discriminative Training and Improved Neural Network Architecture and
Optimization Method
Author:
Wei-Cheng Chao, Hsiu-Jui Chang, Tien-Hong Lo, and
Berlin Chen
Abstract:
This paper sets out to investigate the effect of acoustic modeling on Mandarin
large vocabulary continuous speech recognition (LVCSR). In order to obtain more
discriminative baseline acoustic models, we adopt the recently proposed
lattice-free maximum mutual information (LF-MMI) criterion as the objective for
sequential training of component neural networks in replace of the conventional
cross entropy criterion. LF-MMI brings the benefit of efficient
forward-backward statistics accumulation on top of the graphical processing
unit (GPU) for all hypothesized word sequences without the need of an explicit
word lattice generation process. Paired with LF-MMI, the component neural
networks of acoustic models implemented with the so-called time-delay neural
network (TDNN) often lead to impressive performance. In view of the above, we
explore an integration of two novel extensions of acoustic modeling. One is to
conduct semi-orthogonal low-rank matrix factorization on the TDNN-based
acoustic models with deeper network layers to increase their robustness. The
other is to integrate the backstitch mechanism into the update process of
acoustic models for promoting the level of generalization. Extensive
experiments carried out on a Mandarin broadcast news transcription task reveal
that the integration of these two novel extensions of acoustic modeling can
yield considerably improvements over the baseline LF-MMI in terms of character
error rate (CER) reduction.
Keywords:
Mandarin Large Vocabulary Continuous Speech Recognition, Acoustic Model,
Discriminative Training, Matrix Factorization, Backstitch.
Title:
Supporting Evidence Retrieval for Answering Yes/No Questions
Author:
Meng-Tse Wu, Yi-Chung Lin and Keh-Yih
Su
Abstract:
This paper proposes a new n-gram matching approach for retrieving the
supporting evidence, which is a question related text passage in the given
document, for answering Yes/No questions. It locates the desired passage
according to the question text with an efficient and simple n-gram matching
algorithm. In comparison with those previous approaches, this model is more
efficient and easy to implement. The proposed approach was tested on a task of
answering Yes/No questions of Taiwan elementary school Social Studies lessons.
Experimental results showed that the performance of our proposed approach is 5%
higher than the well-known Apache Lucene search engine.
Keywords:
Supporting Evidence Retrieval, Q&A for Yes/No Questions
Title:
An OOV Word Embedding Framework for Chinese Machine Reading Comprehension
Author:
Shang-Bao Luo, Ching-Hsien Lee, Jia-Jang
Tu and Kuan-Yu Chen
Abstract:
When using Deep Learning methods in NLP-related tasks, we usually represent a
word by using a low-dimensional dense vector, which is named the word
embedding, and these word embeddings can then be
treated as feature vectors for various neural network-based models. However, a
major challenge facing such a mechanism is how to represent OOV words. There
are two common strategies in practiced: one is to remove these words directly;
the other is to represent OOV words by using zero or random vectors. To
mitigate the flaw, we introduce an OOV embedding framework, which aims at
generating reasonable low-dimensional dense vectors for OOV words. Furthermore,
in order to evaluate the impact of the OOV representations, we plug the
proposed framework into the Chinese machine reading comprehension task, and a
series of experiments and comparisons demonstrate the good efficacy of the
proposed framework.
Keywords:
Natural Language Processing, Word Embedding, Out-of-vocabulary, Machine Reading
Comprehension
Title:
Hierarchical Multi-Label Chinese Word Semantic Labeling using Deep Neural
Network
Author:
Wei-Chieh Chou and Yih-Ru Wang
Abstract:
Traditionally, classifying over 100 hierarchical multi-labels could use flatten
classification, but it will lose the taxonomy structure information. This paper
aimed to classify the concept of word in E-HowNet and
proposed a deep neural network training method with hierarchical relationship
in E-HowNet taxonomy. The input of neural network is
word embedding. About word embedding, this paper proposed order-aware 2-Bag
Word2Vec. Experiment results shown hierarchical classification will achieved
higher accuracy than flatten classification.
Keywords:
Word2Vec, Neural Network, Minimum Classification Error, E-HowNet,
Hierarchical Classification, Multi-label Classification