中文計算語言學期刊                        [English]
第十三卷 第一期 2008


標題:
Exploring Shallow Answer Ranking Features in Cross-Lingual and Monolingual Factoid Question Answering

作者:
Cheng-Wei Lee, Yi-Hsun Lee, and Wen-Lian Hsu

摘要:
Answer ranking is critical to a QA (Question Answering) system because it determines the final system performance. In this paper, we explore the behavior of shallow ranking features under different conditions. The features are easy to implement and are also suitable when complex NLP techniques or resources are not available for monolingual or cross-lingual tasks. We analyze six shallow ranking features, namely, SCO-QAT, keyword overlap, density, IR score, mutual information score, and answer frequency. SCO-QAT (Sum of Co-occurrence of Question and Answer Terms) is a new feature proposed by us that performed well in NTCIR CLQA. It is a co-occurrence based feature that does not need extra knowledge, word-ignoring heuristic rules, or special tools. Instead, for the whole corpus, SCO-QAT calculates co-occurrence scores based solely on the passage retrieval results. Our experiments show that there is no perfect shallow ranking feature for every condition. SCO-QAT performs the best in C-C (Chinese-Chinese) QA, but it is not a good choice in E-C (English-Chinese) QA. Overall, Frequency is the best choice for E-C QA, but its performance is impaired when translation noise is present. We also found that passage depth has little impact on shallow ranking features, and that a proper answer filter with fined-grained answer types is important for E-C QA. We measured the performance of answer ranking in terms of a newly proposed metric EAA (Expected Answer Accuracy) to cope with cases of answers that have the same score after ranking.

關鍵字:
Answer Ranking, Co-occurrence, CLQA, Question Answering, Shallow Method, SCO-QAT

 


標題:
Two Approaches for Multilingual Question Answering: Merging Passages vs. Merging Answers

作者:
Rita M. Aceves-Pérez, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda, and L. Alfonso Ureña-López

摘要:
One major problem in multilingual Question Answering (QA) is the integration of information obtained from different languages into one single ranked list. This paper proposes two different architectures to overcome this problem. The first one performs the information merging at passage level, whereas the second does it at answer level. In both cases, we applied a set of traditional merging strategies from cross-lingual information retrieval. Experimental results evidence the appropriateness of these merging strategies for the task of multilingual QA, as well as the advantages of multilingual QA over the traditional monolingual approach.

關鍵字:
Multilingual Question Answering, Cross-Lingual Information Retrieval, Information Merging.
 


篇名:
使用叢集式交互訓練之跨語言新聞群組推薦

作者:
楊正仁陳英祥
吳秉蓉

摘要:
目前有許多新聞入口網站提供新聞群組服務,便於讀者瀏覽豐富的相關新聞資訊。然而就目前所觀察到的情況,這些入口網站僅提供單語的新聞環境。以致於當讀者想要閱讀不同語文的新聞報導時,便需要透過繁複的搜尋步驟才能得到相關的資訊。在本論文中,我們提出了一個跨語言的新聞類別推薦架構,應用交互訓練的技術,幫助讀者找到有關的不同語文的新聞報導群組。我們以Google新聞作為資料集,並使用支持向量機以及最大熵模型兩種分類器進行了實驗。實驗結果顯示了我們提出交互訓練的推薦機制,在大部分的情形下都可以增進分類器所推薦的準確性。

關鍵字:
跨語言新聞群組對應,交互訓練,語意重疊,群組對應推薦


篇名:
Web-Based Query Translation for English-Chinese CLIR

作者:
Chengye Lu, Yue Xu, and Shlomo Geva 

摘要:
Dictionary-based translation is a traditional approach in use by cross-language information retrieval systems. However, significant performance degradation is often observed when queries contain words that do not appear in the dictionary. This is called the Out of Vocabulary (OOV) problem. In recent years, Web mining has been shown to be one of the effective approaches for solving this problem. However, the questions of how to extract Multiword Lexical Units (MLUs) from the Web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in Web mining based automated translation approaches. Most statistical approaches to MLU extraction rely on statistical information extracted from huge corpora. In the case of using Web mining techniques for automated translations, these approaches do not perform well because the size of the corpus is usually too small and statistical approaches that rely on a large sample can become unreliable. In this paper, we present a new Chinese term measurement and a new Chinese MLU extraction process that work well on small corpora. We also present our approach to the selection of MLUs in a more accurate manner. Our experiments show marked improvement in translation accuracy over other commonly used approaches.

關鍵字:
Cross-Language Information Retrieval, CLIR, Query Translation, Web Mining, OOV Problem, Term Extraction


篇名:
低頻縮寫詞及專有名詞之查詢翻譯改進方法

作者:
盧文祥林浚宏張耀升

摘要:
未知詞翻譯在誇語檢索及機器翻譯系統中扮演著相當重要角色,但至今仍未有效的被解決。近年來,在未知術語翻譯的研究中,部分學者已經提出利用網路探勘技術經由全球資訊網內蘊藏豐富多語資源來解決未知術語翻譯的問題。然而,對於低頻未知術語的翻譯,例如低頻的縮寫詞及專有名詞,透過上述方法依舊無法有效率地被成功擷取。因此,本論文分別提出創新的以網路為基礎之縮寫詞翻譯擷取方法和混合式雙階段翻譯擷取方法,來解決從網路搜尋結果中擷取低頻縮寫詞及專有名詞的翻譯問題。此外,為了有效率地應用音譯詞技術來降低專有名詞翻譯問題的困難度,我們又提出混合音節對應音譯模型的統計方法及以網路為基礎的非監督式學習演算法來解決線上及時英文對中文的專有名詞音譯問題。最後,根據實驗結果顯示我們提出新的翻譯擷取方法比先前其他研究學者提出以搜尋結果為基礎的翻譯擷取方法皆有顯著的改進。

關鍵字:
誇語資訊檢索、音譯詞、未知詞翻譯、網路搜尋結果、機器翻譯
 


標題:
Analyzing Information Retrieval Results With a Focus on Named Entities

作者:
Thomas Mandl and Christa Womser-Hacker

摘要:
Experiments carried out within evaluation initiatives for information retrieval have been building a substantial resource for further detailed research. In this study, we present a comprehensive analysis of the data of the Cross Language Evaluation Forum (CLEF) from the years 2000 to 2004. Features of the topics are related to the detailed results of more than 100 runs. The analysis considers the performance of the systems for each individual topic. Named entities in topics revealed to be a major influencing factor on retrieval performance. They lead to a significant improvement of the retrieval quality in general and also for most systems and tasks. This knowledge, gained by data mining on the evaluation results, can be exploited for the improvement of retrieval systems as well as for the design of topics for future CLEF campaigns.

關鍵字:
Cross-Lingual Information Retrieval, Evaluation Issues, Named Entities (NEs)