Author:
Shan-Shun Yang, Shih-Hung Wu, Liang-Pu Chen, Hung-Sheng Chiu, and Ren-Dar Yang
Abstract:
Recognizing Textual Entailment (RTE) is a new research issue in natural language processing (NLP) research area. RTE can be a useful component in many NLP applications. In this paper, we introduce our finding on the entailment analysis of the NTCIR-10 RITE-2 dataset, and use the observation to improve our system. In the previous works, all the input pairs are treated equally in a standard classification architecture. We find that is not suitable for some special cases. We believe that by isolating the special cases and building separated classifiers, a RTE system can perform better. After implementing modules for four special cases into our system, the result is significantly improved from 67.86% to 72.92% on the binary class classification task.
Keywords:
Chinese Recognizing Textual Entailment, Entailment Analysis
Author:
Jian-cheng Wu, Hsun-wen Chiu, and Jason S. Chang
Abstract:
Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Nevertheless, compared to spell checkers for alphabetical languages (e.g., English or French), Chinese spell checkers are more difficult to develop because there are no word boundaries in the Chinese writing system and errors may be caused by various Chinese input methods. In this paper, we propose a novel method for detecting and correcting Chinese typographical errors. Our approach involves word segmentation, detection rules, and phrase-based machine translation. The error detection module detects errors by segmenting words and checking word and phrase frequency based on compiled and Web corpora. The phonological or morphological typographical errors found then are corrected by running a decoder based on the statistical machine translation model (SMT). The results show that the proposed system achieves significantly better accuracy in error detection and more satisfactory performance in error correction than the state-of-the-art systems.
Keywords:
Chinese Spelling Detection, Chinese Spelling Correction, Chinese Similar Characters, Ngram, Language Model, Machine Translation.
Author:
Jian-cheng Wu, Jim Chang, and Jason S. Chang
Abstract:
In this paper, we present a new method based on machine translation for correcting serial grammatical errors in a given sentence in learners�� writing. In our approach, translation models are generated to translate the input into a grammatical sentence. The method involves automatically learning two translation models that are based on Web-scale n-grams. The first model translates trigrams containing serial preposition-verb errors into correct ones. The second model is a back-off model, used in the case where the trigram is not found in the training data. At run-time, the phrases in the input are matched and translated, and ranking is performed on all possible translations to produce a corrected sentence as output. Evaluation on a set of sentences in a learner corpus shows that the method corrects serial errors reasonably well. Our methodology exploits the state-of-the art in machine translation, resulting in an effective system that can deal with many error types at the same time.
Keywords:
Grammatical Error Correction, Serial Errors, Machine Translation, N-grams, Language Model
Author:
You-shan Chung and Keh-Jiann Chen
Abstract:
In this project, we have studied Chinese noun-noun compounds (NNCs) and have found that N1 and N2 are linked either by semantic roles assigned by events (complex relations) or by static relations (simple relations), including meronymy, conjunction, and the host-attribute-value relation. Using data from the FrameNet and E-HowNet, we have found that, for NNCs of either type, the major semantic relations between the two components are limited enough to allow computational implementation. Regarding simple relations, most conjunction pairs have been listed in E-HowNet,and so are host-attribute-value sets. The E-HowNet Taxonomy also makes identification of meronymy possible. As for NNCs involving complex relations, each component�䏭 semantic role, along with the events that assign these roles, can be restored through mappings to corresponding frame elements (FEs) in entity and to event frames and lexical units (LUs) in FrameNet�䏭 frames, respectively, that represent the concept the NNC conveys.
Keywords:
Noun-noun Compounds, Automatic Interpretation, Extended HowNet (E-HowNet), FrameNet
Author:
Ju-Yun Cheng, Yi-Chin Huang, and Chung-Hsien Wu
Abstract:
Fluency and continuity properties are essential in synthesizing a high quality singing voice. In order to synthesize a smooth and continuous singing voice, the Hidden Markov Model-based synthesis approach is employed in this study to construct a Mandarin singing voice synthesis system. The system is designed to generate Mandarin songs with arbitrary lyrics and melody in a certain pitch range. In this study, a singing voice database is designed and collected, considering the phonetic converge of Mandarin singing voices. Synthesis units and a question set are defined carefully and tailored the meet the minimum requirement for Mandarin singing voice synthesis. In addition, pitch-shift pseudo data extension and vibrato creation are applied to obtain more natural synthesized singing voices.
The evaluation results show that the system, based on tailored synthesis units and the question set, can improve the quality and intelligibility of the synthesized singing voice. Using pitch-shift pseudo data and vibrato creation can further improve the quality and naturalness of the synthesized singing voices.
Keywords:
Mandarin Singing Voice Synthesis, Hidden Markov Models, Vibrato
Author:
Yu-Jhe Li, Chung-Che Wang, Liang-Yu Chen, Jyh-Shing Roger Jang, and Ren-Yuan Lyu
Abstract:
This research focuses on validating a Taiwanese speech corpus by using speech recognition and assessment to automatically find the potentially problematic utterances. There are three main stages in this work: acoustic model training, speech assessment and error labeling, and performance evaluation.
In the acoustic model training stage, we use the ForSD (Formosa Speech Database) ,provided by Chang Gung University (CGU), to train hidden Markov models (HMMs) as the acoustic models. Monophone, biphone (right context dependent), and triphone HMMs are tested. The recognition net is based on free syllable decoding. The best syllable accuracies of these three types of HMMs are 27.20%, 43.28%, and 45.93% respectively.
In the speech assessment and error labeling stage, we use the trained triphone HMMs to assess the unvalidated parts of the dataset. And then we split the dataset as low-scored dataset, mid-scored dataset, and high-score dataset by different thresholds. For the low-scored dataset, we identify and label the possible cause of having such a lower score. We then extract features from these lower-scored utterances and train an SVM classifier to further examine if each of these low-scored utterances is to be removed.
In the performance evaluation stage, we evaluate the effectiveness of finding problematic utterances by using 2 subsets of ForSD, TW01, and TW02 as the training dataset and one of the following: the entire unprocessed dataset, both mid-scored and high-scored dataset, and high-scored dataset only. We use these three types of joint dataset to train and to evaluate the performance. The syllable accuracies of these three types of HMMs are 40.22%, 41.21%, 44.35% respectively.
From the previous result, the disparity of syllable accuracy between the HMMs trained by unprocessed dataset and processed dataset can be 4.13%. Obviously, it proves that the processed dataset is less problematic than unprocessed dataset. We can use speech assessment automatically to find the potential problematic utterances.
Keywords:
Taiwanese Corpus Validation, Hidden Markov Model, Speech Assessment, Support Vector Machine
Author:
Hung-Yan Gu and Jia-Wei Chang
Abstract:
Spectral over-smoothing is still observable in the converted spectral envelope when linear multivariate regression (LMR) based spectrum mapping is adopted to convert voice. Therefore, in this paper, we study to place a histogram-equalization (HEQ) module immediately before LMR based mapping and to place a target frame selection (TFS) module immediately after LMR based mapping. These two modules are intended to promote the quality of the converted voice. Here, HEQ processing includes the two steps: (a) transform discrete cepstral coefficients (DCC) into principal component analysis (PCA) coefficients; (b) transform PCA coefficients into cumulated density function (CDF) coefficients. As to TFS, an input frame is first processed to obtain its converted DCC and its segment-class number. Then, the group of target-speaker frames corresponding to the same segment-class number is searched to find a target frame whose DCC are sufficiently close to the converted DCC. Next, the converted DCC are replaced by the DCC of the target frame found. In experimental evaluation, the outside parallel sentences (not used in model-parameter training) are used to measure average cepstral distances (ACD) between the converted DCC and the target DCC. When the HEQ module is added, the value of ACD would be increased a little. Furthermore, the value of ACD would be apparently increased when the TFS module is added. Nevertheless, according to the measured VR (variance ratio) values and the scores of subjective listening tests, the quality of the converted voice will become better when HEQ is added, and become much better when TFS is added. As to the reasons for why the measured ACD values and the perceived converted-voice qualities are inconsistent, we have found one possible cause which can explain why this inconsistency may occur.
Keywords:
Voice Conversion, Linear Multivariate Regression, Histogram Equalization, Target Frame Selection, Discrete Cepstral Coefficients
Author:
Hao-teng Fan, Wen-yu Tseng, and Jeih-weih Hung
Abstract:
In this paper, we present a novel method to extract noise-robust speech feature representation in speech recognition. This method employs the algorithm of linear predictive coding (LPC) on the feature time series of mel-frequency cepstral coefficients (MFCC). The resulting linear predictive version of the feature time series, in which the linear prediction error component is removed, reveals more noise-robust than the original one, probably because the prediction error portion corresponding to the noise effect is alleviated accordingly. Experiments conducted on the Aurora-2 connected digit database shows that the presented approach can enhance the noise robustness of various types of features in terms of significant improvement in recognition performance under a wide range of noise environments. Furthermore, a low order of linear prediction for the presented method suffices to give promising performance, which implies this method can be implemented in a quite efficient manner.
Keywords:
Noise Robustness, Speech Recognition, Linear Predictive Coding, Temporal Filtering
��