International Journal of Computa

International Journal of Computational Linguistics & Chinese Language Processing [銝剜�]
Vol. 13, No. 1, March 2008

Exploring Shallow Answer Ranking Features in Cross-Lingual and Monolingual Factoid Question Answering
Cheng-Wei Lee, Yi-Hsun Lee, and Wen-Lian Hsu
[pdf | html]

Two Approaches for Multilingual Question Answering: Merging Passages vs. Merging Answers
Rita M. Aceves-Pérez, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda, and L. Alfonso Ureña-López
[pdf | html]

Cross-Lingual News Group Recommendation Using Cluster-Based Cross-Training
Cheng-Zen Yang, Ing-Xiang Chen, and Ping-Jung Wu
[pdf | html]

Web-Based Query Translation for English-Chinese CLIR
Chengye Lu, Yue Xu, and Shlomo Geva
[pdf | html]

Improving Translation of Queries with Infrequent Unknown Abbreviations and Proper Names
Wen-Hsiang Lu, Jiun-Hung Lin, and Yao-Sheng Chang
[pdf | html]
Analyzing Information Retrieval Results With a Focus on Named Entities
Thomas Mandl and Christa Womser-Hacker
[pdf | html]

Title:
Exploring Shallow Answer Ranking Features in Cross-Lingual and Monolingual Factoid Question Answering

Author:
Cheng-Wei Lee, Yi-Hsun Lee, and Wen-Lian Hsu

Abstract:
Answer ranking is critical to a QA (Question Answering) system because it determines the final system performance. In this paper, we explore the behavior of shallow ranking features under different conditions. The features are easy to implement and are also suitable when complex NLP techniques or resources are not available for monolingual or cross-lingual tasks. We analyze six shallow ranking features, namely, SCO-QAT, keyword overlap, density, IR score, mutual information score, and answer frequency. SCO-QAT (Sum of Co-occurrence of Question and Answer Terms) is a new feature proposed by us that performed well in NTCIR CLQA. It is a co-occurrence based feature that does not need extra knowledge, word-ignoring heuristic rules, or special tools. Instead, for the whole corpus, SCO-QAT calculates co-occurrence scores based solely on the passage retrieval results. Our experiments show that there is no perfect shallow ranking feature for every condition. SCO-QAT performs the best in C-C (Chinese-Chinese) QA, but it is not a good choice in E-C (English-Chinese) QA. Overall, Frequency is the best choice for E-C QA, but its performance is impaired when translation noise is present. We also found that passage depth has little impact on shallow ranking features, and that a proper answer filter with fined-grained answer types is important for E-C QA. We measured the performance of answer ranking in terms of a newly proposed metric EAA (Expected Answer Accuracy) to cope with cases of answers that have the same score after ranking.

Keywords: Answer Ranking, Co-occurrence, CLQA, Question Answering, Shallow Method, SCO-QAT

Title:
Two Approaches for Multilingual Question Answering: Merging Passages vs. Merging Answers

Author:
Rita M. Aceves-Pérez, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda, and L. Alfonso Ureña-López

Abstract:
One major problem in multilingual Question Answering (QA) is the integration of information obtained from different languages into one single ranked list. This paper proposes two different architectures to overcome this problem. The first one performs the information merging at passage level, whereas the second does it at answer level. In both cases, we applied a set of traditional merging strategies from cross-lingual information retrieval. Experimental results evidence the appropriateness of these merging strategies for the task of multilingual QA, as well as the advantages of multilingual QA over the traditional monolingual approach.

Keywords:
Multilingual Question Answering, Cross-Lingual Information Retrieval, Information Merging.

Title:
Cross-Lingual News Group Recommendation Using Cluster-Based Cross-Training

Author:
Cheng-Zen Yang, Ing-Xiang Chen, and Ping-Jung Wu

Abstract:
Many Web news portals have provided clustered news categories for readers to browse many related news articles. However, to the best of our knowledge, they only provide monolingual services. For readers who want to find related news articles in different languages, the search process is very cumbersome. In this paper, we propose a cross-lingual news group recommendation framework using the cross-training technique to help readers find related cross-lingual news groups. The framework is studied with different implementations of SVM and Maximum Entropy models. We have conducted several experiments with news articles from Google News as the experimental data sets. From the experimental results, we find that the proposed cross-training framework can achieve accuracy improvement in most cases.

Keywords:
Cross-Lingual News Group Mapping, Cross-Training, Semantic Overlapping, Mapping Recommendation

Title:
Web-Based Query Translation for English-Chinese CLIR

Author:
Chengye Lu, Yue Xu, and Shlomo Geva

Abstract:
Dictionary-based translation is a traditional approach in use by cross-language information retrieval systems. However, significant performance degradation is often observed when queries contain words that do not appear in the dictionary. This is called the Out of Vocabulary (OOV) problem. In recent years, Web mining has been shown to be one of the effective approaches for solving this problem. However, the questions of how to extract Multiword Lexical Units (MLUs) from the Web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in Web mining based automated translation approaches. Most statistical approaches to MLU extraction rely on statistical information extracted from huge corpora. In the case of using Web mining techniques for automated translations, these approaches do not perform well because the size of the corpus is usually too small and statistical approaches that rely on a large sample can become unreliable. In this paper, we present a new Chinese term measurement and a new Chinese MLU extraction process that work well on small corpora. We also present our approach to the selection of MLUs in a more accurate manner. Our experiments show marked improvement in translation accuracy over other commonly used approaches.

Keywords:
Cross-Language Information Retrieval, CLIR, Query Translation, Web Mining, OOV Problem, Term Extraction

Title:
Improving Translation of Queries with Infrequent Unknown Abbreviations and Proper Names

Author:
Wen-Hsiang Lu, Jiun-Hung Lin, and Yao-Sheng Chang

Abstract:
Unknown term translation is important to CLIR and MT systems, but it is still an unsolved problem. Recently, a few researchers have proposed several effective search-result-based term translation extraction methods which explore search results to discover translations of frequent unknown terms from Web search results. However, many infrequent unknown terms, such as abbreviations and proper names (or named entities), and their translations are still difficult to be obtained using these methods. Therefore, in this paper we present a new search-result-based abbreviation translation method and a new two-stage hybrid translation extraction method to solve the problem of extracting translations of infrequent unknown abbreviations and proper names from Web search results. In addition, to efficiently apply name transliteration techniques to mitigate the problems of proper name translation, we propose a mixed-syllable-mapping transliteration model and a Web-based unsupervised learning algorithm for dealing with online English-Chinese name transliteration. Our experimental results show that our proposed new methods can make great improvements compared with the previous search-result-based term translation extraction methods.

Keywords:
CLIR, Transliteration, Unknown Term Translation, Web Search Result, Machine Translation.
��

Title:
Analyzing Information Retrieval Results With a Focus on Named Entities

Author:
Thomas Mandl and Christa Womser-Hacker

Abstract:
Experiments carried out within evaluation initiatives for information retrieval have been building a substantial resource for further detailed research. In this study, we present a comprehensive analysis of the data of the Cross Language Evaluation Forum (CLEF) from the years 2000 to 2004. Features of the topics are related to the detailed results of more than 100 runs. The analysis considers the performance of the systems for each individual topic. Named entities in topics revealed to be a major influencing factor on retrieval performance. They lead to a significant improvement of the retrieval quality in general and also for most systems and tasks. This knowledge, gained by data mining on the evaluation results, can be exploited for the improvement of retrieval systems as well as for the design of topics for future CLEF campaigns

Keywords:
Cross-Lingual Information Retrieval, Evaluation Issues, Named Entities (NEs)
��

��