International Journal of Computational Linguistics & Chinese Language Processing
Vol. 1, No. 1, August 1996

A Survey on Automatic Speech Recognition with an Illustrative Example on Continuous Speech Recognition of Mandarin
Chin-Hui Lee, Biing-Hwang Juang
[pdf | html]
Issues in Text-to-Speech Conversion for Mandarin
Chilin Shih, Richard Sproat
[pdf | html]
A Mandarin Text-to-Speech System
Sin-Horng Chen, Shaw-Hwa Hwang, Yih-Ru Wang
[pdf | html]
An Overview of Corpus-Based Statistics-Oriented(CBSO) Techniques for Natural Language Processing
Keh-Yih Su, Tung-Hui Chiang, Jing-Shin Chang
[pdf | html]
A Hybrid Approach to Machine Translation System Design
Kuang-Hua Chen, Hsin-Hsi Chen
[pdf | html]
A Model for Robust Chinese Parser
Keh-Jiann Chen
[pdf | html]
Important Issues on Chinese Information Retrieval
Lee-Feng Chien, Hsiao-Tieh Pu
[pdf | html]

Title:
A Survey on Automatic Speech Recognition with an Illustrative Example on Continuous Speech Recognition of Mandarin

Author:
Chin-Hui Lee, Biing-Hwang Juang

Abstract:
For the past two decades, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Speech recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, tomedium size vocabulary voice interactive command and control systems on personal computers, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. In this paper we review some of the key advances in several areas of automatic speech recognition. We also illustrate, by examples, how these key advances can be used for continuous speech recognition of Mandarin. Finally we elaborate the requirements in designing successful real-world applications and address technical challenges that need to be harnessed in order to reach the ultimate goal of providing an easy-to-use, natural, and flexible voice interface between people and machines.

Keyword:
hidden Markov modeling, dynamic programming, speech recognition, acoustic modeling, Mandarin speech recognition, spoken language systems

Title:
Issues in Text-to-Speech Conversion for Mandarin

Author:
Chilin Shih, Richard Sproat

Abstract:
Research on text-to-speech (TTS) conversion for Mandarin Chinese is a much younger enterprise than comparable research for English or other European languages. Nonetheless, impressive progress has been made over the last couple of decades, and Mandarin Chinese systems now exist which approach, or in some ways even surpass in quality available systems for English. This article has two goals. The first is to summarize the published literature on Mandarin synthesis, with a view to clarifying the similarities or differences among the various efforts. One property shared by a great many systems is the dependence on the syllable as the basic unit of synthesis. We shall argue that this property stems both from the accidental fact that Mandarin has a small number of syllable types, and from traditional Sinological views of the linguistic structure of Chinese. Despite the popularity of the syllable, though, there are problems with using it as the basic synthesis unit, as we shall show. The second goal is to describe in more detail some specific problems in text-to-speech conversion for Mandarin, namely text analysis, concatenative unit selection, segmental duration and tone and intonation modeling. We illustrate these topics by describing our own work on Mandarin synthesis at Bell Laboratories. The paper starts with an introduction to some basic concepts in speech synthesis, which is intended as an aid to readers who are less familiar with this area of research.

Keyword:
Chinese speech synthesis, text analysis, concatenative units, duration, tone and intonation

Title:
A Mandarin Text-to-Speech System

Author:
Sin-Horng Chen, Shaw-Hwa Hwang, Yih-Ru Wang

Abstract:
In this paper, the implementation of a high-performance Mandarin TTS system is presented. The system is composed of four main parts: text analysis (TA), prosodic information generation (PIG), a waveform table of 411 base-syllables (WT), and PSOLA-based waveform synthesis (PSOLA). In TA, statistical model based method is first employed to automatically tag the input text to obtain the word sequence and the associated part-of-speech (POS) sequence. A lexicon containing about 80000 words is used in the tagging process. Then the corresponding base-syllable sequence is found and used in WT to form the basic waveform sequence of the base-syllables. Some linguistic features used in PIG are also extracted in TA. In PIG, a four-layer recurrent neural network (RNN) is employed to generate some prosodic information including the pitch contour, energy level, initial duration and final duration of syllables as well as the inter-syllable pause duration. Lastly, in PSOLA, the basic waveform sequence is modified using the prosodic information to generate output synthetic speech. The whole system has been implemented by software on a PC/AT 486 with a 16-bit Sound Blaster add-on card. Only memory spaces of 3.2 Mbyte and 5.5 Mbyte are, respectively, required for the two versions with sampling rates of 10 kHz and 20 kHz. It can synthesize speech in real-time for any input Chinese text. Informal listening tests by many native Chinese living in Taiwan have confirmed that the synthetic speech sounds very fluent and natural.

Keyword:
speech synthesis, prosodic information, recurrent neural network, PSOLA, Mandarin Text-to-Speech

Title:
An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natural Language Processing

Author:
Keh-Yih Su, Tung-Hui Chiang, Jing-Shin Chang

Abstract:
A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in knowledge acquisition in terms of cost and consistency. Therefore, it is very difficult for such systems to be scaled-up. Statistical methods, with the capability of automatically acquiring knowledge from corpora, are becoming more and more popular, in part, to amend the shortcomings of rule-based approaches. However, most simple statistical models, which adopt almost nothing from existing linguistic knowledge, often result in a large parameter space and, thus, require an unaffordably large training corpus for even well-justified linguistic phenomena. The corpus-based statistics-oriented (CBSO) approach is a compromise between the two extremes of the spectrum for knowledge acquisition. CBSO approach emphasizes use of well-justified linguistic knowledge in developing the underlying language model and application of statistical optimization techniques on top of high level constructs, such as annotated syntax trees, rather than on surface strings, so that only a training corpus of reasonable size is needed for training and long distance dependency between constituents could be handled. In this paper, corpus-based statistics-oriented techniques are reviewed. General techniques applicable to CBSO approaches are introduced. In particular, we shall address the following important issues: (1) general tasks in developing an NLP system; (2) why CBSO is the preferred choice among different strategies; (3) how to achieve good performance systematically using a CBSO approach, and (4) frequently used CBSO techniques. Several examples are also reviewed.

Keyword:
corpus, CBSO, knowledge acquisition, class-based language modeling, natural language processing

Title:
A Hybrid Approach to Machine Translation System Design

Author:
Kuang-Hua Chen, Hsin-Hsi Chen

Abstract:
It is difficult for pure statistics-based machine translation systems to process long sentences. In addition, the domain dependent problem is a key issue under such a framework. Pure rule-based machine translation systems have many human costs in formulating rules and introduce inconsistencies when the number of rules increases. Integration of these two approaches reduces the difficulties associated with both. In this paper, an integrated model for machine translation system is proposed. A partial parsing method is adopted, and the translation process is performed chunk by chunk. In the synthesis module, the word order is locally rearranged within chunks via the Markov model. Since the length of a chunk is much shorter than that of a sentence, the disadvantage of the Markov model in dealing with long distance phenomena is greatly reduced. Structural transfer is fulfilled using a set of rules; in contrast, lexical transfer is resolved using bilingual constraints. Qualitative and quantitative knowledge is employed interleavingly and cooperatively, so that the advantages of these two approaches can be retained.

Keyword:
bigram language model, lexical selection, machine translation system, probabilistic chunker, predicate-argument structure, X'-theory

Title:
A Model for Robust Chinese Parser

Author:
Keh-Jiann Chen

Abstract:
The Chinese language has many special characteristics which are substantially different from western languages, causing conventional methods of language processing to fail on Chinese. For example, Chinese sentences are composed of strings of characters without word boundaries that are marked by spaces. Therefore, word segmentation and unknown word identification techniques must be used in order to identify words in Chinese. In addition, Chinese has very few inflectional or grammatical markers, making purely syntactic approaches to parsing almost impossible. Hence, a unified approach which involves both syntactic and semantic information must be used. Therefore, a lexical feature-based grammar formalism, called Information-based Case Grammar, is adopted for the parsing model proposed here. This grammar formalism stipulates that a lexical entry for a word contains both semantic and syntactic feature structures. By relaxing the constraints on lexical feature structures, even ill-formed input can be accepted, broadening the coverage of the grammar. A model of a priority controlled chart parser is proposed which, in conjunction with a mechanism of dynamic grammar extension, addresses the problems of: (1) syntactic ambiguities, (2) under-specification and limited coverage of grammars, and (3) ill-formed sentences. The model does this without causing inefficient parsing of sentences that do not require relaxation of constraints or dynamic extension of the grammar.

Keyword:
Chinese Parser, Robust Parser, Information-based Case Grammar, Branch-and-Bound Algorithm

Title:
Important Issues on Chinese Information Retrieval

Author:
Lee-Feng Chien, Hsiao-Tieh Pu

Abstract:
In this paper, we will emphasize the significance of Chinese information retrieval in this age of the Internet, and raise several important research issues which are fundamental and require further investigation. At the same time, we will point out some problems and requirements which have often been neglected in designing general Chinese IR systems. Furthermore, experiences obtained from the design of the Csmart system will be described also.

Keyword:
information retrieval, full-text searching, Chinese information processing