International Journal of Computational Linguistics & Chinese Language Processing                                   [中文]
                                                                                          Vol. 22, No. 1, June 2017


Title:
An Empirical Comparison of Contemporary Unsupervised Approaches for Extractive Speech Summarization

Author:
Shih-Hung Liu, Kuan-Yu Chen, Kai-Wun Shih, Berlin Chen, Hsin-Min Wang and Wen-Lian Hsu

Abstract:
Due to the rapid-developed Internet and with the big data era coming, the automatic summarization research has been emerged a popular research topic. The aim of automatic summarization is in attempt to select important text or spoken sentence to represent the topic (theme) of original text or spoken document according to a predefined summarization ratio. In this study we frame automatic summarizaiton task as an ad-hoc information retrieval (IR) problem and employ the mathematical sound language modeling (LM) framework for extractive speech summarization, which can perform important sentence selection in an unsupervised manner and has shown its preliminary success. The main contribution of this paper is three-fold. First, by the virtue of relevance modeling, we explore several effective sentence modeling formulations to enhance the sentence models involved in the LM-based summarization framework and the first use of tri-mixture model to improve the performance of extractive speech summarization. Second, since the language modeling will suffer from data sparseness problem and the common solution is to adopt smoothing techniques, in this research we investigate three different smoothing approaches to evaluate how they influence the summarization performance. Third, we further apply the well-studied ranking model (BM25) and also its variants in IR community for ranking important sentence in extractive speech summarization. Experiments conducted on public avaiable dataset (MATBN) and the results show that our applied methods have effective summarization performance when compared to the other well-practiced and state-of-the-art unsupervised methods.

Keywords: BM25, Language Modeling, Pseudo-Relevance Feedback, Relevance Modeling, Extractive Automatic Summarization


Title:
The Asymmetric Occurences of Dou1 and Shao3 in the [Numeral + Measure Word/Classifier + Noun] Construction: A Corpus-based Analysis

Author:
Wei-Yu Chen and Siaw-Fong Chung

Abstract:
As two words with opposite meanings, dou1 and shao3 are expected to be similar and different in various environments. In this work, we looked into the construction of [numeral + measure word/classifier + noun] (hereafter [Num + MW/CL + N]) and explored in what ways both words present an asymmetric phenomenon. We found that dou1 carries a numeral meaning while shao3 lacks this use. Based on the analysis of these two words in the Sinica Corpus, this paper argued that dou1 is better categorized as ‘Neu’ and suggested that dou1 in [Num + MW/CL + N] serves two functions: one is for counting numbers; the other is for the expression of quantities. These findings can be related to the use of dou1 as complement and as numeral.

Keywords:
Dou1 and Shao3 , [Num + MW/CL + N], Antonyms/opposites, Numeral Concept


Title:
An Approach to Extract Product Features from Chinese Consumer Reviews and Establish Product Feature Structure Tree

Author:
Xinsheng Xu, Jing Lin, Ying Xiao and Jianzhe Yu

Abstract:
With the progress of e-commerce and web technology, a large volume of consumer reviews for products are generated from time to time, which contain rich information regarding consumer requirements and preferences. Although China has the largest e-commerce market in the world, but few of researchers investigated how to extract product feature from Chinese consumer reviews effectively, not to analyze the relations among product features which are very significant to implement comprehensive applications. In this research, a framework is proposed to extract product features from Chinese consumer reviews and construct product feature structure tree. Through three filtering algorithms and two-stage optimizing word segmantation process, phrases are identified from consumer reviews. And the expanded rule template, which consists of elements: phrase, POS, dependency relation, governing word, and opinion, is constructed to train the model of conditional random filed (CRF). Then the product features are extracted based on CRF. Besides, two index are defined to describe product feature quantitatively such as frequency and sentiment score. Based on these, product feature structure tree is established through a potential parent node searching process. Furthermore, categories of extensive experiments are conducted based on 5,806 experimental corpuses from taobao.com, suning.com, and zhongguancun.com. The results from these experiments provide evidences to guide product feature extraction process. Finally, an application of analyzing the influences among product features is conducted based on product feature structure tree. It provides valuable management connotations for designer, manufacturer, or retailer.

Keywords:
Chinese Consumer Review, Product Feature Extraction, Rule Template, Sentiment Analysis, Product Feature Structure Tree