Foreword
The 5th Chinese Lexical Semantics Workshop (CLSW-5) was held in Singapore from June 21 ~ 24, 2004. The workshop provided a successful forum for researchers from various areas of Chinese lexical semantics, including lexical resources, representation and inference, interfaces, and applications.
The workshop received approximately sixty submissions. The program committee accepted thirty-five as oral presentations and twenty as posters after blind review. From the oral presentations, we further selected fourteen papers to be included in this special issue of CLCLP on Chinese lexical semantics based on a second round of review.
These papers can be divided into the following four categories.
i) Design and representation: Huang et al. presents the Sinica sense management system; Liu et al. discusses collocational asymmetry in Mandarin Verbs; Li et al. looks at temporal adverbs and aspects; and Chen et al. discusses a thesaurus of Chinese function words. These papers show that the research on Chinese lexical semantics has developed to the stage where knowledge bases can be built and systematic accounts can be proposed.
ii) Meaning categorization and disambiguation: Chen et al. presents a disambiguation approach based on Hownet senses; Tsai et al. provides a disambiguation scheme based on syntactic construction; Wang et al. is about the usage of construction theory in lexical semantics; and Yu et al. is focused on dummy verbs in contemporary Chinese. These papers attest to the fact that polysemy and disambiguation will continue to be foci of lexical semantic studies.
iii) Corpus study of word senses: Kwong et al. discusses the usage and perception of judgment terms in Pan-Chinese contexts; Xia and Yu gives a categorization of Chinese name entities; and Wu et al. presents a bidirectional investigation of the semantic relations in both CCD and coordinate structures. They continue to build on the corpus-based pillar of Chinese lexical semantics.
iv) Metaphors: Chung et al. talks about source domains as concept domains in metaphorical expressions; Li et al. explores semantic projection in Chinese metaphorical idioms; and Wang et al. focuses on the evaluation of lexical emotions based on a grammatical knowledge base of contemporary Chinese. The compactness of knowledge and the mapping between two knowledge domains have made metaphor one of the most exciting topics in Chinese lexical semantics.
Last, but not the least, we are grateful to the great efforts of the program committee members, especially Professor Cheng Chin-Chuan, Professor Benjamin Tsou and Professor Yu Shiwen, for their first round reviews of these papers. Thanks also go to Professor Feng Zhiwei, Professor Xiao Guozheng, Professor He Tingting, Professor Iwen Su, Professor Kathleen Ahrens, Dr. Zhan Weidong, Dr. Zhao Jun, Dr. Zhang Min, Dr. Zhou Guodong, and Dr. Gao Hong, for their second round review of these papers. Finally, the CLSW5 could not be held without the generous support from COLIPS and Dr. Kim Teng Lua. Publication of this special volume is supported by IJCLCLP, with great help from Professor Chung-Hsien Wu, the chief editor, and Ms. Abbey Ho, the editorial assistant.
Guest Editor: Donghong Ji and Chu-Ren Huang
December 22, 2005
Author:
Chu-Ren Huang, Chun-Ling Chen, Cui-Xia Weng, Hsiang-Ping Lee, Yong-Xiang Chen and Keh-Jiann Chen
Abstract:
A sense-based lexical knowledgebase is a core foundation for language engineering. Two important criteria must be satisfied when constructing a knowledgebase: linguistic felicity and data cohesion. In this paper, we discuss how data cohesion of the sense information collected using the Sinica Sense Management System (SSMS) can be achieved. SSMS manages both lexical entries and word senses, and has been designed and implemented by the Chinese Wordnet Team at Academia Sinica. SSMS contains all the basic information that can be merged with the future Chinese Wordnet. In addition to senses and meaning facets, SSMS also includes the following information: POS, example sentences, corresponding English synset(s) from Princeton WordNet, and lexical semantic relations, such as synonym/antonym and hypernym/hyponym. Moreover, the overarching structure of the system is managed by using a sense serial number, and an inter-entry structure is established by means of cross-references among synsets and homographs. SSMS is not only a versatile development tool and management system for a sense-based lexical knowledgebase. It can also serve as the database backend for both Chinese Wordnet and any sense-based applications for Chinese language processing.
Keyword:
Lexical Knowledgebase, Word Senses, Chinese Wordnet, Lexical Semantic Relation
Author:
Mei-Chun Liu and Chun Edison Chang
Abstract:
This paper examines the collocational patterns of Mandarin verbs of conversation and proposes that a finer classification scheme than the flat structure of ‘frames’ [cf. Fillmore and Atkins 1992; Baker et al. 2003] is needed to capture the semantic granularity of verb types. The notion of a ‘subframe’ is introduced and utilized to explain the syntactic-semantic interdependencies among different groups of verbs in the Conversation Frame. The paper aims to provide detailed linguistic motivations for distinguishing subframes within a frame as a semantic anchor for further defining near-synonym sets.
Keyword:
Mandarin verbs of conversation, Semantic Frame, Subframe, Collocational Association
Author:
Shih-Min Li, Su-Chu Lin and Keh-Jiann Chen
Abstract:
In this paper, we propose clear-cut definitions of distinct temporal adverbs and provide descriptive features for each class of temporal adverbs. By measuring time points in temporal axis, we revise and reclassify the temporal adverbs listed in [Lu and Ma 1999] into four classes of semantic roles, namely, time, frequency, duration, and time manner. The descriptive features enable us to distinguish temporal relations and predict logical compatibility between temporal adverbs and aspects.
Keyword:
Temporal Adverbs, Aspects, Temporal Schema, Speaking Time, Reference Time, Event Time
Author:
陳群秀(Qunxiu Chen)
Abstract:
At present, worldwide scholars from different countries attach great importance to the knowledge resource construction of language information processing. Knowledge includes the aspects of lexicology, syntax, semantics, pragmatics and even general knowledge. The knowledge of semantics is the core, of which the most basic and most important one is the semantic knowledge of vocabulary. During the last 10 years, scholars of Tsinghua University and Cooperators have conducted systematic and complete studies on lexical semantic knowledge of the main categories of Chinese content words (including verb, adjective, noun, etc), and developed and studied “the Machine Tractable Dictionary of Contemporary Chinese Predicate Verbs”, “the Machine Tractable Dictionary of Contemporary Chinese Predicate Adjectives”, “the System of Relations of Slots Centering on Nouns for Contemporary Chinese”, and “the Thesaurus Dictionary of Contemporary Chinese for Information Processing”. However, research on lexical semantic knowledge for information processing in particular of Chinese functional words has so far been limited. Firstly, this paper discusses the value of functional words in Chinese Information Processing. Chinese is a language whose typology involves isolated language and root-word-based analytical language. The vast majority of Chinese words can not themselves clearly express grammatical sense. Chinese syntactical methods mainly depend on the use of functional words and word order. In addition it is important that Chinese functional words express meaning of sentences and structures of texts, e.g. voice, tense & acpect, mood, modality, positive and negative, relation of sentences, degree, range, draw into object, etc. Therefore, the study of lexical semantics of Chinese functional words is great importance to Chinese Information Processing. Secondly, this paper introduces our research on the Machine Tractable Thesaurus Dictionary of Contemporary Chinese Functional Words. The results of our initial research are presented. The Chinese functional words are divided into ten main-level categories, e.g. Voice Category, Tense & Aspect Category, Mood Category, Modality Category, Positive and Negative Category, Relation Category of Sentences, Degree Category, Range Category, Object Category, Other Category, etc. Each of main-level category is divided into a number of middle-level categories or minor-level category once and again according to expressing different meaning. This paper also presents some ideas of designing information terms for dictionary entries. Finally, we proposes the conception of constructing a base platform for the Contemporary Chinese Semantic Knowledge Bank.
Keywords:
The Machine Tractable Thesaurus Dictionary of Contemporary Chinese Functional Words, The Modality Representation System of Chinese Sentences for Information Processing, Voice Category, Tense & Aspect Category, Mood Category, Modality Category, Positive and Negative Category, Relation Category of Sentences, Degree Category, Range Category, Object Category, Other Category, Dictionary Entries, Information Terms
Author:
Hao Chen, Tingting He, Donghong Ji and Changqin Quan
Abstract:
The research on word sense disambiguation (WSD) has great theoretical and practical significance in many fields of natural language processing (NLP). This paper presents an unsupervised approach to Chinese word sense disambiguation based on Hownet (an electronic Chinese lexical resource). In our approach, contexts that include ambiguous words are converted into vectors by means of a second-order context method, and these context vectors are then clustered by the k-means clustering algorithm. Lastly, the ambiguous words can be disambiguated after a similarity calculation process is completed. Our experiments involved extraction of terms, and an 82.62% average accuracy rate was achieved.
Keyword:
Word Sense Disambiguation, Hownet,Second-Order Context,
K-Means Clustering
Abstract:
This paper explores the correlation between lexical semantics and syntactic construction, with particular attention paid to delimiting the lexical semantic distinctions between the multiple senses of the polysemous verb song4 ‘to send’. Different types of syntactic constructions are first categorized according to the distribution of arguments, such as direct objects, indirect objects, and locatives. Four senses and two meaning facets are then identified and formulated.
Keyword:
Polysemy, Word Sense Disambiguation, Corpus, Meaning Facet
Author:
王惠(Hui Wang)
Abstract:
This paper, set within the theoretical framework of Construction Grammar, aims to pave the way for a comprehensive investigation of the behavior of Chinese lexical meaning with a focus on word sense and structure sense. Taking a top-down view, we discuss observed nuances in the meaning based on different syntax distributions, and describe the related semantic features, distributions and observed dynamic change. In addition, taking a bottom-up view, we use the analytical methods of the semantic system and also describe and explain many kinds of semantic relations between different structures, such as polysemy, synonymy, antonym, and homonymy.
Keyword:
Construction Grammar, Form-Meaning, Structure sense, Lexical Sense
Author:
俞士汶(Shiwen Yu)、朱學鋒(Xuefeng Zhu)、段慧明(Huiming Duan)
Abstract:
In contemporary Chinese, there is a subclass of verbs called Dummy Verbs. After briefly introducing the lexical meanings of two typical dummy verb, ‘Jiayi’ and ‘Jinxing’, this paper discusses the grammatical attributes of ‘Jiayi’ and ‘Jinxing’ in detail and further explores their functions as markers of syntactic constituents and semantic roles.
Keyword:
Dummy Verb, Lexical Meaning, Grammatical Attribute, Semantic Role
Author:
Oi Yee Kwong and Benjamin K. Tsou
Abstract:
This paper reports on a synchronous corpus-based study of the everyday usage of a set of Chinese judgement terms. An earlier study on Hong Kong data found that these terms were more polysemous than their English counterparts within the legal domain, and were even more fuzzily used in general news reportage. The current study further compares their usage in general texts from other Chinese speech communities (Beijing, Taiwan, and Singapore) to explore the regional differences in lexicalisation and perception of the relevant legal concepts. Corpus data revealed the distinctiveness of the Singapore data, and that the contrasting frequency distributions of the terms and senses could be a result of the varied focus in reportage or the use of alternative expressions for the same concepts in individual communities. The analysis will contribute to the construction and enrichment of Pan-Chinese lexico-semantic resources, which will be useful for many natural language processing applications, such as machine translation.
Keyword:
Synchronous Corpus-Based Study, Legal Concept and Terminology, Regional Differences, Pan-Chinese Lexico-Semantic Resources
Author:
夏迎炬(YingJu Xia)、于浩(Hao Yu)、西野文人(Fumihito Nishino)
Abstract:
Named entity recognition is a very important part of information retrieval and information extraction. Classification is also very important. This paper investigates the sub-classification of named entities from the point of view of information retrieval and information extraction. This paper also presents multi-classification and gives detailed information about each sub-class. We have manually annotated people’s daily corpus (1998) and conducted a serial of experiments using the statistical model of named entity recognition. The experimental results show that the sub-classes presented by this paper can enhance the recognition system’s performance and aid information retrieval and information extraction.
Keyword:
Named Entity, Classification, Corpus, Natural Language Processing
Author:
吳云芳(Yunfang Wu)、李素建(Sujian Li)、李芸(Yun Li)、俞士汶(Shiwen Yu)
Abstract:
This paper presents a bidirectional investigation on linguistic form and meaning. On the one hand, based on the Chinese Concept Dictionary (CCD), this paper examines the semantic relations between the heads of the conjuncts of nominal coordination. Most of the conjuncts show semantic similarity, a few of them show semantic association, and a few of others show semantic opposition. On the other hand, the semantic relations between the conjuncts provide a new perspective on the noun taxonomy and suggest ways to improve the taxonomy.
Keyword:
Coordinate Structure, Semantic Relation, Semantic Similarity,
Taxonomy
Author:
Siaw-Fong Chung, Kathleen Ahrens and Chu-Ren Huang
Abstract:
The use of lexical resources in linguistic analysis has expanded rapidly in recent years. However, most lexical resources, such as WordNet or online dictionaries, at this point do not usually indicate figurative meanings, such as conceptual metaphors, as part of a lexical entry. Studies that attempt to establish the relationships between literal and figurative language by detecting the connectivity between WordNet relations usually do not deal with linguistic data directly. However, the present study demonstrates that SUMO definitions can be used to identify the source domains used in conceptual metaphors. This is achieved by identifying the relationships between metaphorical expressions and their corresponding ontological nodes. Such links are important because they show which lexical items are mapped under which concepts. This, in turn, helps specify which lexical items in electronic resources involve conceptual mappings. Looking specifically at the concept of PERSON, this work also establishes connectivity between lexical items which are related to “Organism.” Therefore, the methodology reported herein not only aids the categorizing of lexical items according to their conceptual domains but also can establish links between these items. Such bottom-up and top-down analyses of lexical items may provide a means of representing metaphorical entries in lexical resources.
Keyword:
Concept, Ontology, Conceptual Metaphor, Source Domain, WordNet, SUMO
Author:
李芸(Yun Li)、李素建(Sujian Li)、王治敏(Zhimin Wang)、吳云芳(Yunfang Wu)
Abstract:
Metaphors in idioms are universal in languages. The progression from conceptual meaning to metaphorical meaning is a cognitive activity of mapping from one thing to another. This paper, based on the analysis of 2,347 metaphorical idioms, their linguistic features, grammatical structures, grammatical functions, and semantic categories, tries to identify categories of metaphorical idioms. Most of the metaphorical idioms in Chinese are found to be composed of projective semantic elements and descriptive semantic elements, among which the former are the basis for mapping from explicit to the implicit meaning, while the latter are the key to understanding the relevance of the two parts.
Keyword:
Idiom, Metaphorical Idiom, Semantic Mapping
Author:
王治敏(Zhimin Wang)、朱學鋒(Xuefeng Zhu)、俞士汶(Shiwen Yu)
Abstract:
This paper introduces the attributes of emotional evaluation in the Grammatical Knowledge-base of Contemporary Chinese. Lexical emotion tagging is studied by means of both qualitative and quantitative approaches. Based on the statistical results from the People’s Daily tagging corpus, lexical emotional trends are described and formulated in our Knowledge-base. Lastly, we also discuss potential applications of emotion tagging in related tasks such as text filtering, information retrieval and web page evaluation.
Keyword:
Evaluation of Lexical Emotion, Semantic Prosody, Collocation Regulation