Author:
Lung-Hao Lee, Liang-Chih Yu, and Li-Ping Chang
Abstract:
This introduction paper describes the research trends of Chinese as a second/foreign language along with related studies. We also overview the research papers included in this special issue. Finally, we conclude the findings and offer the suggestions.
Keywords:
Computer-Assisted Language Learning, Second Language Acquistion, Leaner Corpora, Interlanguage, Mandarin Chinese
Author:
Jinhua Xiong, Qiao Zhang, Shuiyuan Zhang, Jianpeng Hou and Xueqi Cheng
Abstract:
The number of people learning Chinese as a Foreign Language (CFL) has been booming in recent decades. The problem of spelling error correction for CFL learners increasingly is becoming important. Compared to the regular text spelling check task, more error types need to be considered in CFL cases. In this paper, we propose a unified framework for Chinese spelling correction. Instead of conventional methods, which focus on rules or statistics separately, our approach is based on extended HMM and ranker-based models, together with a rule-based model for further polishing, and a final decision-making step is adopted to decide whether to output the corrections or not. Experimental results on the test data of foreigner's Chinese essays provided by the SIGHAN 2014 bake-off illustrate the performance of our approach.
Keywords:
Chinese Spelling Correction, HMM, Ranker-Base Model, Rule-based Model, Decision-making
Author:
Chuan-Jie Lin and Wei-Cheng Chu
Abstract:
This paper proposes an automatic method to build a Chinese spelling check system. Confusion sets were expanded by using two language resources, Shuowen Jiezi and the Four-Corner codes, which improved the coverages of the confusion sets. Nine scoring functions which utilize the frequency data in the Google Ngram Datasets were proposed, where the idea of smoothing was also adopted. Thresholds were also decided in an automatic way. The final system achieved far better than our baseline system in CSC 2013 Evaluation Task.
Keywords:
Chinese spelling check, confusion set expansion, Google Ngram scoring function
Author:
Tao-Hsing CHANG, Yao-Ting SUNG and Jia-Fei HONG
Abstract:
This paper proposed a method that can automatically detect syntax errors in Chinese sentences. The algorithm for identifying syntax errors proposed in this study is known as KNGED, which uses a large database of rules to identify whether syntax errors exist in a sentence. The rules were generated either manually or automatically. This paper further proposed an algorithm for identifying the type of error that a sentence contained. Experimental results shown that the false positive rate and F1-measure of the proposed method for detecting syntax errors in Chinese sentences are 0.90 and 0.65.
Keywords:
Syntactic Errors, Chinese Grammar, Chinese Written Corpus
Author:
Jui-Feng Yeh and Chan-Kun Yeh
Abstract:
This paper proposed a word usage classification for �𡤜e�� in Chinese as a secondary language by rule induction algorithm. Learning of Chinese characters and tone adaption are both essential and hard tasks for non-native speakers. The frequent terms, defined in morphosyntatic particle �𡤜e�� with three characters {��, å¾�, �°}, is hard to learn for foreign learners due to the similar pronunciation and meaning. This investment illustrates a data-driven algorithm to classify the usages about the morphosyntatic particle �𡤜e�� in Chinese learning. Rule induction is one of the most important techniques to learn the knowledge from data. Since regularities hidden in data are frequently expressed in terms of rules, rule induction is one of the fundamental tools for natural language processing and obtains a significant improvement in character selection. By the automatic rule induction process, 32 rules are adopted here to classify the character usage in morphosyntatic particle �𡤜e.�� According to the experimental results, we find the proposed method can provide good enough performance to classify the character usages for morphosyntatic particle �𡤜e.��
Keywords:
Rule Induction, Natural Language Processing, Secondary Language Learning, Classifier, Word Usage
Author:
Tzu-Yun Tung, Howard Hao-Jan Chen and Hui-Mei Yang
Abstract:
The function word le in Chinese serves as both a sentence final particle (le1) and an aspect marker (le2). As an aspect marker indicating the completion of action, le has been observed to be frequently misused by learners of Chinese, among which the overgeneralization of �𦧺e�� a past-tense marker is the most glaring. Based on �鑥TNU Chinese learners�� written corpus��, we analyzed the usage and the error types of le made by English-speaking learners at the beginning (A2) and the intermediate level (B1).The results show that both A2 and B1 learners acquire le2 before le1, and in terms of error analyses, le1 is the most commonly spotted error type and there is a large number of redundancy of the use of le2 and le(1+2). Therefore, in a similar vein with Teng (1999), this current study sides with the proposition that the use of le2 along with its associated sentence patterns should be taught prior to that of le1. Pedagogical implication as well as the suggestion of the editing of CFL textbooks are also provided.
Keywords:
�𦧺e,�� Error Analysis, Chinese Learner Written Corpus, Chinese Teaching
Author:
Keiko MOCHIZUKI, Hiroshi SANO, Ya-Ming SHEN and Chia-Hou WU
Abstract:
This paper presents an empirical study on the difficulties in learning Chinese as a second language based on learners�� corpora written by native English speakers and native Japanese speakers at CEFR-based A2 and B1 levels. The first part of this paper will discuss the procedures for how to collect learners�� corpora, proofread, establish an error tag system and annotate errors. Later it will focus on a significant difference in the production of �� ä¸� + Classifier�� among the corpora of native English speakers and native Japanese speakers. The corpus of English native speakers displays an overuse of �� ä¸� + Classifier��, even in an atelic context like a negative construction or a conditional construction where a �� ä¸� + Classifier�� should not occur. On the other hand, the corpus of Japanese native speakers displays a lack of �� ä¸� + Classifier��. This striking contrast is due to whether or not a determiner position exists in each language. Since English has a determiner position which accommodates an article, �𦡞/an, the��, �𦭛his/that/ my/your/~�䏭��, English-native learners tend to treat the �� ä¸� + Classifier�� as an article although it does not appear in an atelic event structure. On the other hand, Japanese does not have any determiner position before a Noun Phrase, therefore it is assumed that Japanese learners find it difficult to learn the conditions where a �� ä¸� + Classifier�� is necessary.
Keywords:
Learner�䏭 Corpus, Annotation System, Error Analysis, Online Dictionary of Misused Chinese based on Learners�� Corpora, Interference of Mother Tongues.
��