International Journal of Computational Linguistics & Chinese Language Processing                                   [中文]
                                                                                          Vol. 19, No. 3, September 2014


Title:
BCCWJ-TimeBank: Temporal and Event Information Annotation on Japanese Text.

Author:
Masayuki ASAHARA, Sachi KATO, Hikari KONISHI, Mizuho IMADA, and Kikuo MAEKAWA

Abstract:
Temporal information extraction can be divided into the following tasks: temporal expression extraction, time normalisation, and temporal ordering relation resolution. The first task is a subtask of a named entity and numeral expression extraction. The second task is often performed by rewriting systems. The third task consists of event anchoring. This paper proposes a Japanese temporal ordering annotation scheme that is used to annotate expressions by referring to ‘the ‘Balanced Corpus of Contemporary Written Japanese’ (BCCWJ). We extracted verbal and adjective event expressions as in a subset of BCCWJ and annotated a temporal ordering relation on the pairs of these event expressions and time expressions obtained from a previous study. The recognition of temporal ordering by language recipients tends to disagree with the normalisation of time expressions. Nevertheless, we should not strive for unique gold annotation data in such a situation. Rather, we should evaluate the degree of inter-annotator discrepancies among subjects in an experiment. This study analysed inter-annotator discrepancies across three annotators performing temporal ordering annotation. The results show that the annotators exhibit little agreement for time segment boundaries, whereas a high level of agreement is exhibited for the annotation of temporal relative ordering tendencies.

Keywords: Temporal Information Processing, Event Semantics, Corpus Annotation


Title:
Transliteration Extraction from Classical Chinese Buddhist Literature Using Conditional Random Fields with Language Models

Author:
Yu-Chun Wang, Karol Chia-Tien Chang, Richard Tzong-Han Tsai, and Jieh Hsiang

Abstract:
Extracting plausible transliterations from historical literature is a key issue in historical linguistics and other research fields. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language preferences among translators. To assist historical linguists and digital humanities researchers, this paper propose a transliteration extraction method based on the conditional random field method with features based on the language models and the characteristics of the Chinese characters used in transliterations which are suitable to identify transliteration characters. To evaluate our method, we compiled an evaluation set from two Buddhist texts, the Samyuktagama and the Lotus Sutra. We also constructed a baseline approach with a suffix array based extraction method and phonetic similarity measurement. Our method significantly outperforms the baseline approach and the recall of our method achieves 0.9561 and the precision is 0.9444. The results show our method is very effective to extract transliterations in classical Chinese texts.

Keywords:
Transliteration Eextraction, Classical Chinese, Buddhist Literation, Langauge Model, Conditional Random Fields, CRF.


Title:
Modeling Human Inference Process for Textual Entailment Recognition

Author:
Hen-Hsen Huang, Kai-Chun Chang, and Hsin-Hsi Chen

Abstract:
To prepare an evaluation dataset for textual entailment (TE) recognition, human annotators label many rich linguistic phenomena on text and hypothesis expressions. These phenomena illustrate implicit human inference process to determine the relations of given text-hypothesis pairs. This paper aims at understanding what human think in TE recognition process and modeling their thinking process to deal with this problem. At first, we analyze a labelled RTE-5 test set which has been annotated with 39 linguistic phenomena of 5 aspects by Mark Sammons et al., and find that the negative entailment phenomena are very effective features for TE recognition. Then, a rule-based method and a machine learning method are proposed to extract this kind of phenomena from text-hypothesis pairs automatically. Though the systems with the machine-extracted knowledge cannot be comparable to the systems with human-labelled knowledge, they provide a new direction to think TE problems. We further annotate the negative entailment phenomena on Chinese text-hypothesis pairs in NTCIR-9 RITE-1 task, and conclude the same findings as that on the English RTE-5 datasets

Keywords:
Textual Entailment Recognition, Chinese Processing, Semantic