an NLP Day

Information Sciences Institute
University of Southern California

活動日期：2012年3月13日(星期二)下午
活動地點：中央研究院資訊所新館106演講廳

主辦單位：中央研究院資訊所、中華民國計算語言學學會

時間：14:00~15:30
講題：A New Semantics: Merging Propositional and Distributional Information

Abstract:

Despite hundreds of years of study on semantics, theories and representations of semantic content—the actual meaning of the symbols used in semantic propositions—remain impoverished. The traditional extensional and intensional models of semantics are difficult to actually flesh out in practice, and no large-scale models of this kind exist. Recently, researchers in Natural Language Processing (NLP) have increasingly treated topic signature word distributions (also called ‘context vectors’, ‘topic models’, ‘language models’, etc.) as a de facto placeholder for semantics at various levels of granularity. This talk argues for a new kind of semantics that combines traditional symbolic logic-based proposition-style semantics (of the kind used in older NLP) with (computation-based) statistical word distribution information (what is being called Distributional Semantics in modern NLP). The core resource is a single lexico-semantic ‘lexicon’ that can be used for a variety of tasks. I show how to define such a lexicon, how to build and format it, and how to use it for various tasks. Combining the two views of semantics opens many fascinating questions that beg study, including the operation of logical operators such as negation and modalities over word(sense) distributions, the nature of ontological facets required to define concepts, and the action of compositionality over statistical concepts.

時間：16:00~17:30
講題：Text Harvesting and Ontology Construction using a Powerful New Method

Abstract:

People build databases and metadata structures/ontologies to collect, systematize, and make available to users knowledge in a consistent and hopefully trustworthy form. But the largest data collection today, the web, is not systematic, consistent, or trustworthy, and the access techniques we use are provably inadequate. Over the past decade, various researchers have developed web harvesting methods to extract information from the web and organize it in various ways. Various different methods have been tried, but none has had much success; inconsistencies, knowledge gaps, the need for manual intervention, the lack of gold standard material to evaluate against, and other problems plague the automated harvesting methods. Focusing on unstructured text, I describe a method to extract information from the web, organize it, and form both a knowledge base and its taxonomic term ontology/metadata. The method is competitive to or outperforms existing large-scale information harvesting from the web, and is very simple to implement. In the talk, I also describe some of the deep problems fundamental to ontology building, as they are made apparent in this work.

This is joint work with Dr. Zornitsa Kozareva (USC Information Sciences Institute).

備註：

1. Prof. Eduard Hovy在當日上午10:00，同一場地有一場由中研院資訊所主辦之卓越演講，歡迎大家踴躍參加，演講相關訊息請參閱中研院資訊所網頁，演講簡介請參閱下頁。

2. 本活動免費參加，請線上報名。

3. 本會提供代訂午餐餐盒(80元)服務，需此項服務者請於線上報名時註明，費用於報到時繳交。

4. 報名截止日：2012年3月8日(星期四)。

5. 洽詢專線02-27883799分機1502，E-mail: [email protected]，黃琪小姐。