Author:
Yanping Chen, Qinghua Zheng, Feng Tian, Deli Zheng
Abstract:
Chinese Segmentation Ambiguity (CSA) is a fundamental problem confronted when processing Chinese language, where a sentence can generate more than one segmentation paths. Two techniques are commonly used to identify CSA: Omni-segmentation and Bi-directional Maximum Matching (BiMM). Due to the high computational complexity, Omni-segmentation is difficult to be applied for big data. BiMM is easier to be implemented and has a higher speed. However, recall of BiMM is much lower. In this paper, a Segmentation Matrix (SM) method is presented, which encodes each sentence as a matrix, then maps string operation into set operations. To identify CSA, instead of scanning a whole sentence, only specific areas of the matrix are checked. SM has a computational complexity close to BiMM with recall the same as Omni-segmentation. In addition to CSA identification, SM also supports lexicon-based Chinese word segmentation. In our experiments, based on SM, several issues about CSA are explored. The result shows that SM is useful for CSA analysis.
Keywords:
Segmentation Matrix, Segmentation Ambiguity
Author:
Yung-Chun Chang, Chun-Han Chu, Chien Chin Chen, and Wen-Lian Hsu
Abstract:
Previous studies on emotion classification mainly focus on the emotional state of the writer. By contrast, our research emphasizes emotion detection from the readers' perspective. The classification of documents into reader-emotion categories can be applied in several ways, and one of the applications is to retain only the documents that trigger desired emotions to enable users to retrieve documents that contain relevant contents and at the same time instill proper emotions. However, current information retrieval (IR) systems lack the ability to discern emotions within texts, and the detection of reader? emotion has yet to achieve a comparable performance. Moreover, previous machine learning-based approaches generally use statistical models that are not in a human-readable form. Thereby, it is difficult to pinpoint the reason for recognition failures and understand the types of emotions that the articles inspired on their readers. In this paper, we propose a flexible emotion template-based approach (TBA) for reader-emotion detection that simulates such process in a human perceptive manner. TBA is a highly automated process that incorporates various knowledge sources to learn an emotion template from raw text that characterize an emotion and are comprehensible for humans. Generated templates are adopted to predict reader? emotion through an alignment-based matching algorithm that allows an emotion template to be partially matched through a statistical scoring scheme. Experimental results demonstrate that our approach can effectively detect reader? emotions by exploiting the syntactic structures and semantic associations in the context, while outperforming currently well-known statistical text classification methods and the stat-of-the-art reader-emotion detection method.
Keywords:
Reader-Emotion Detection, Emotion Template, Template-based Approach, Text Classification, Sentiment Analysis
Author:
Yu-Yang Huang, Rui Yan, Tsung-Ting Kuo, and Shou-De Lin
Abstract:
Personalized language models are useful in many applications, such as personalized search and personalized recommendation. Nevertheless, it is challenging to build a personalized language model for cold start users, in which the size of the training corpus of those users is too small to create a reasonably accurate and representative model. We introduce a generalized framework to enrich the personalized language models for cold start users. The cold start problem is solved with content written by friends on social network services. Our framework consists of a mixture language model, whose mixture weights are estimated with a factor graph. The factor graph is used to incorporate prior knowledge and heuristics to identify the most appropriate weights. The intrinsic and extrinsic experiments show significant improvement on cold start users.
Keywords:
Language Model, Factor Graph, Social Network Analysis, Smoothing, Cold-Start Problem
Author:
Ting-Xuan Wang and Wen-Hsiang Lu
Abstract:
Conventional search engines usually consider a search query corresponding only to a simple task. Nevertheless, due to the explosive growth of web usage in recent years, more and more queries are driven by complex tasks. A complex task may consist of multiple sub-tasks. To accomplish a complex task, users may need to obtain information of various task-related entities corresponding to the sub-tasks. Users usually have to issue a series of queries for each entity during searching a complex search task. For example, the complex task ?ravel to Beijing??may involve several task-related entities, such as ?otel room,???light tickets,??and ?aps?? Understanding complex tasks with task-related entities can allow a search engine to suggest integrated search results for each sub-task simultaneously. To understand and improve user behavior when searching a complex task, we propose an entity-driven complex task model (ECTM) based on exploiting microblogs and query logs. Experimental results show that our ECTM is effective in identifying the comprehensive task-related entities for a complex task and generates good quality complex task names based on the identified task-related entities.
Keywords:
Complex Search Task, Task Name Identification, Task-related Entity