International Journal of Computational Linguistics & Chinese Language Processing
Vol. 4, No. 2, August 1999


Title:
A Model for Word Sense Disambiguation

Author:
Li Juanzi, Huang Changning

Abstract:
Word sense disambiguation is one of the most difficult problems in natural language processing. This paper puts forward a model of mapping a structural semantic space from a thesaurus into a multi-dimensional, real-valued vector space and gives a word sense disambiguation method based on this mapping. The model, which uses an unsupervised learning method for acquiring the disambiguation knowledge, not only saves extensive manual work, but also realizes the sense tagging of a large amount of content words. Firstly, a Chinese thesaurus Cilin and a very large-scale corpus are used to construct the structure of the semantic space. Then, a dynamic disambiguation model is developed to disambiguate an ambiguous word according to the vectors of monosemous words in each of its possible categories. In order to resolve the problem of data sparseness, a method is proposed to make the model more robust. Testing results show that the model has a relatively good performance and can also be used for other languages.

Keyword:
Natural language processing, Word sense disambiguation, Unsupervised learning, Vector space, Language modeling


Title:
Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval

Author:
Hsin-Hsi Chen, Guo-Wei Bian and Wen-Cheng Lin

Abstract:
This paper deals with translation ambiguity and target polysemy problems together. Two monolingual balanced corpora are employed to learn word co-occurrence for the purpose of translation ambiguity resolution and augmented translation restrictions for that of target polysemy resolution. Experiments show that the model achieves 62.92% monolingual information retrieval, which is 40.80% better than that of the select-all model. When target polysemy resolution is added, the retrieval performance represents approximately a 10.11% increase over that of the model which resolves translation ambiguity only.

Keyword:
Cross-language information retrieval, Query translation, Translation ambiguity, Target polysemy, Augmented translation restriction


Title:
General Knowledge Annotation Based on How-net (
�抅�䲰�䰻蝬脩�撣貉�条䰻霅䀹�蹱釣)

Author:
Gan Kok Wee, Tham Wai Mun (
憿誩�见��, 霅𡁏�扳��)

Abstract:
�䰻蝬脫糓�钅�躰�䂿�撣貉�条䰻霅睃澈嚗峕�讛膩璁�敹菔���敹萎�钅�梶車蝔桐�滚�𣬚���靝�嚗���𡠺銝𠹺�衤�漤�靝���餈𤑳儔��靝�����滨儔��靝����其辣���㟲擃娪�梶���靝���撅祆�扯��挪銝颱�钅�梶���靝�����鞉�躰���𣂼�銋钅�梶���靝���撠漤����靝�����閙�贝�坿𠧧��靝���峕�敹萄�𣬚𣶹��靝����𧋦���⏚�鍂�䰻蝬脫�蹱釣鈭�銝㕑𨯬�𤌍閰䂿�隤墧�踺����穃�𤑳�隤墧�嗘��䌊銝剖亢��𠉛弦�堺撟唾﹛隤墧�坔澈(蝚砌�厩��)銝剜�厰�𦦵冗���塳蝵芰��𥼚蝡惩𥼚撠汿����撠��蹱釣�䲮瘜蓥誑��𦠜�蹱釣�𡒊�衤葉���䔄�𣶹����誯���峕�穃�𤑳�閫�捱�䲮獢��䁅��𥼚��𨳍��

Keyword:
Machine Translation, Mandarin, Speech Synthesis, Taiwanese, Min Nan, Tone Sandhi.


Title:
Project Report: Sinica Treebank (
銝剜��蘂蝯鞉�𧢲邦鞈��坔澈��瑽见遣)

Author:
Feng-Yi Chen, Pi-Fang Tsai, Keh-Jiann Chen, Chu-Ren Hunag (
�䒰曈喳��, �㷍蝣扯𠓼, �䒰�见��, 暺�撅�隞�)

Abstract:
銝剜��蘂蝯鞉�𧢲邦鞈��坔澈撱箸��(Sinica Treebank)��銝餉��𤌍���糓��𣂷�𥕢葉���䌊�嗉�噼���閧���𠉛弦銝��见�瑟�㗇�躰�䁅�墧�坔澈����𠉛弦蝝䭾�琜�峕�穃�穃虾隞亙�鮋�坔�衤葉���蘂蝯鞉�𧢲邦鞈��坔澈銝剜𡂝��𤥁�墧�閧䰻霅矋�䔶�蠘�厩眏隤墧�閧䰻霅条��𡂝��𤥁��𦁈閫�蝙��穃�𤑳���𡝗�鞟頂蝯勗�蠘�賣凒頞典������𧋦����讠晶銝剜��蘂蝯鞉�𧢲邦鞈��坔澈(Sinica Treebank)瑽见遣�䲮瘜訫�峕郊撽����硺�𠉛蓡�𨯬閰䂿�銝剖亢��𠉛弦�堺撟唾﹛隤墧�坔澈嚗�Sinica Corpus嚗㚁�峕𡂝��硋蘂摮琜�䔶誑閮𦠜�舐�箸𧋦�聢雿滩�墧�𤏪��Information - based Case Grammar, ICG嚗厩�銵券�娍芋撘讐�箏抅�𧋦�沲瑽页�𣬚�梶眏�𤓖�西䌊��訫�𡝗�鞉�鞟�鞉�𧢲邦嚗�虾隞亦椘��讐雁��蝯鞉�𧢲�躰�条�銝��稲�改�峕�敺䔶蒂��牐誑鈭箏極靽格迤��瑼a�梹�䔶誑蝬剜�璅躰�条�甇�Ⅱ�扼��撠齿䲰甇抒儔���蘂瘜閧�鞉�见耦撘誩�𡃏�鮋�墧�躰�矋�峕�穃�睲���𣂼枂��閧�����笔����