Title:
A Study on Dispersion Measures for Core Vocabulary Compilation
Author:
Ming-Hong Bai, Jian-Cheng Wu, Ying-Ni Chien, Shu-Ling Huang and Ching-Lung Lin
Abstract:
Core vocabulary is a set of words that are stable used across different text types, theme, and application scenario. In natural language, the number of core vocabulary is relatively small, the core vocabulary, however, plays an important part in language learning because it constitutes a major part of communication content. The traditional core vocabulary selection method is mainly based on the expert knowledge and rule of experience. With the rise of corpus linguistics, word frequency and dispersion uniformity provide objective statistical data to assist the selection of core vocabulary. In this paper, we propose a formula that integrates multi-dimensional uniformity , so that the estimation of word uniformity can take different classification dimensions into account. Secondly, we also propose a method of word frequency normalization for the problem of deviation of the traditional method. For evaluation, a method of evaluating the core vocabulary with a heterogeneous corpus is proposed and it can compare the advantages, disadvantages, and characteristics of various statistical formulas. In the results, we actually compare the different core vocabulary selection formulas, analyzed the characteristics of different formulas, and verified the word frequency normalization can correct the shortcomings of the traditional formula. Finally, we also verified that the proposed method which integrates multi-dimensional uniformity can pick out the vocabulary with more core characteristics.
Keywords:
Corpus Linguistics, Core Vocabulary, Fringe Vocabulary, Dispersion Uniformity.
Author:
Yu-Ming Hsieh and Wei-Yun Ma
Abstract:
Rescoring approaches for parsing aims to re-rank and change the order of parse trees produced by a general parser for a given sentence. The re-ranking performance depends on whether or not the rescoring function is able to precisely estimate the quality of parse trees by using more complex features from the whole parse tree. However it is a challenge to design an appropriate rescoring function since complex features usually face the severe problem of data sparseness. And it is also difficult to obtain sufficient information requisite in re-estimatation of tree structures because existing annotated Treebanks are generally small-sized. To address the issue, in this paper, we utilize a large amount of auto-parsed trees to learn the syntactic and sememtic information. And we propose a simple but effective score function in order to integrate the scores provided by the baseline parser and dependency association scores based on dependency-based word embeddings, learned from auto-parsed trees. The dependency association scores can relieve the problem of data sparseness, since they can be still calculated by word embeddings even without occurrence of a dependency word pair in a corpus. Moreover, semantic role labels are also considered to distinct semantic relation of word pairs. Experimental results show that our proposed model improves the base Chinese parser significantly.
Keywords:
Word Embedding, Parsing, Word Dependency, Rescoring
Author:
Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu and Berlin Chen
Abstract:
The performance of automatic speech recognition (ASR) often degrades dramatically in noisy environments. In this paper, we present a novel use of dictionary learning approach to normalizing the magnitude modulation spectra of speech features so as to retain more noise-resistant and important acoustic characteristics. To this end, we employ the K-SVD method to create sparse representations for a common set of basis vectors that span the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. In addition, taking into account the non-negativity property of amplitude modulation spectrum, we utilize the nonnegative K-SVD method, paired with the nonnegative sparse coding method, to capture more noise-robust features. All experiments were conducted on the Aurora-2 corpus and task. The empirical evidence shows that our methods can offer substantial improvements over the baseline NMF method. Finally, we also integrate the proposed variants of the K-SVD method with other well-known robustness methods like Advanced Front-End (AFE), Cepstral Mean and Variance Normalization (CMVN) and Histogram Equalization (HEQ) to further confirm their utility.
Keywords:
Robustness, Automatic Speech Recognition, Modulation Spectrum, Sparse Coding, Dictionary Learning.
Author:
Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Yi-Ju Lin, Kuan-Yu Chen and Berlin Chen
Abstract:
Mispronunciation detection and diagnosis are part and parcel of a computer assisted pronunciation training (CAPT) system, collectively facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This thesis presents a continuation of such a general line of research and the major contributions are three-fold. First, we compared the performance of different pronunciation features in mispronunciation detection. Second, we propose an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate evaluation metric. Third, we can linearly combine two F1-score when we consider F1-score as final objective function. It can effectively deal with the label imbalance problem. A series of experiments on a Mandarin mispronunciation detection task seem to show the performance merits of the proposed methods.
Keywords:
Computer Assisted Pronunciation Training, Mispronunciation Detection, Automatic Speech Recognition, Discrimetive Training, Deep Neural Networks.
Author:
Kuan-Hung Chen, Shu-Han Liao, Yuan-Fu Liao and Yih-Ru Wang
Abstract:
High quality linguistic features is the key to the success of speech synthesis. Traditional linguistic feature extraction methods are usually relied on a word-level natural language processing (NLP) parser. Since, a good parser requires a lot of feature engineering to build, it is usually a genral-purpose one and often not specially designed for speech synthesis. To avoid these difficulties, we propose to replace the conventional NLP parser by a character embedding and a chacter-level recurrent neural network language model (RNNLM) module to directly convert input character sequences, character-by-character, into latent linguistic feature vectors. Experimental results on Chinese-English speech synthesis system showed that the proposed approach achieved comparable performance with transitional NLP parser-based methods.
Keywords:
Speech Synthesis��Linguistic Features��Word2vec��RNNLM
Author:
Ming-Han Yang, Yao-Chi Hsu, Hsiao-Tsung Hung, Ying-Wen Chen, Kuan-Yu Chen, and Berlin Chen
Abstract:
This paper sets out to explore the use of multi-task learning (MTL) techniques for more accurate estimation of the parameters involved in neural network based acoustic models, so as to improve the accuracy of meeting speech recognition. Our main contributions are two-fold. First, we conduct an empirical study to leverage various auxiliary tasks to enhance the performance of multi-task learning on meeting speech recognition. Furthermore, we also study the synergy effect of combing multi-task learning with disparate acoustic models, such as deep neural network (DNN) and convolutional neural network (CNN) based acoustic models, with the expectation to increase the generalization ability of acoustic modeling. Second, since the way to modulate the contribution (weights) of different auxiliary tasks during acoustic model training is far from optimal and actually a matter of heuristic judgment, we thus propose a simple model adaptation method to alleviate such a problem. A series of experiments have been carried out on the Mandarin meeting recording (MMRC) corpora, which seem to reveal the effectiveness of our proposed methods in relation to several existing baselines.
Keywords:
Multi-Task Learning, Deep Learning, Neural Network, Meeting Speech Recognition.
��