Author:
Jia-Cing Ruan, Chiung-Wen Hsu, James Myers, and Jane S. Tsay,
Abstract:
The usual challenges of transcribing spoken language are compounded for Southern Min (Taiwanese) because it lacks a generally accepted orthography. This study reports the development and testing of software tools for assisting such transcription. Three tools are compared, each representing a different type of interface with our corpus-based Southern Min lexicon (Tsay, 2007): our original Chinese character-based tool (Segmentor), the first version of a romanization-based lexicon entry tool called Adult-Corpus Romanization Input Program (ACRIP 1.0), and a revised version of ACRIP that accepts both character and romanization inputs and integrates them with sound files (ACRIP 2.0). In two experiments, naive native speakers of Southern Min were asked to transcribe passages from our corpus of adult spoken Southern Min (Tsay and Myers, in progress), using one or more of these tools. Experiment 1 showed no disadvantage for romanization-based compared with character-based transcription even for untrained transcribers. Experiment 2 showed significant advantages of the new mixed-system tool (ACRIP 2.0) over both Segmentor and ACRIP 1.0, in both speed and accuracy of transcription. Experiment 2 also showed that only minimal additional training brought dramatic improvements in both speed and accuracy. These results suggest that the transcription of non-Mandarin Sinitic languages benefits from flexible, integrated software tools.
Keywords:
Speech Transcription, Southern Min, Taiwanese, Romanization, Key-in Systems
Author:
Chen-Yu Chiang, Qi-Quan Huang, Yih-Ru Wang, Hsiu-Min Yu, and Sin-Horng Chen
Abstract:
This paper presents an Hidden Markov Model (HMM)-based variable speech rate Mandarin Chinese text-to-speech (TTS) system. In this system, parameters of spectrum, fundametal frequency and state duration are generated by a context dependent HMM (CDHMM) whose model parameters are linear-interpolated from those of three CDHMMs trained by corpora in three different speech rates (SRs), i.e. fast, medium and slow. In addition, three decision tree (DT)-based pause break predictors trained by using the three SR corpora are used to interpolate the probabilities for inserting pause breaks. The performance of the proposed TTS system were evaluated by several objective and subjective tests. Experimental results suggested that coherence between interpolation weights for CDHMMs and DT-based pasue predictors is crutial for naturalness of the synthesis speech in variable SR. We believe that the proposed variable speech rate Mandarin Chinese TTS system is more suitable than conventional fixed SR TTS systems for applications of human-machine interaction.
Keywords:
Text-to-Speech System, Mandarin Prosody, Speech Rate, Break Prediction
Author:
Ming-Shing Yu and Yih-Jeng Lin
Abstract:
This paper brings up an important issue, polysemy problems, in a Chinese to Taiwanese TTS (text-to-speech) system. Polysemy means there are words with more than one meaning or pronunciation, such as �𨀣��爗�� (we), �靝��� (no), �靝��� (you), �𨀣��� (I), and �𡏭��� (want). We first will show the importance of the polysemy problem in a Chinese to Taiwanese (C2T) TTS system. Then, we will propose some approaches to a difficult case of such problems by determining the pronunciation of �𨀣��爗�� (we) in a C2T TTS system. There are two pronunciations of the word �𨀣��爗�� (we) in Taiwanese, /ghun/ and /lan/. The corresponding Chinese words are �𣈯玏�� (we1) and �𨅯¢\�� (we2). We propose two approaches and a combination of the two to solve the problem. The results show that we have a 93.1% precision in finding the correct pronunciation of the word �𨀣��爗�� (we). Compared to the results of the layered approach, which has been shown to work well in solving other polysemy problems, the results of the combined approach are an improvement.
Keywords:
Polysemy, Taiwanese, Chinese to Taiwanese TTS System, Layered Approach
Author:
Shih-Hsiang Lin and Berlin Chen
Abstract:
Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this paper, we first present a comprehensive comparison of various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, different granularities of index features, including words, subword units, and their combinations, are also exploited to work in conjunction with various extensions of topic modeling presented in this paper, so as to alleviate SDR performance degradation caused by speech recognition errors. All of the experiments were performed on the TDT Chinese collection.
Keywords:
Information Retrieval, Document Topic Models, Word Topic Models, Spoken Document Retrieval.