International Journal of Computational Linguistics & Chinese Language
Processing
[䏿�]
Vol. 24, No. 1, June 2019
Title:
Leveraging Memory Enhanced Conditional Random Fields with Gated CNN and Automatic BAPS Features for Chinese Named Entity Recognition
Author:
Kuo-Chun Chien and Chia-Hui Chang
Abstract:
Named Entity Recognition (NER) is an essential task in Natural Language Processing. Memory Enhanced CRF (MECRF) integrates external memory to extend Conditional Random Field (CRF) to capture long-range dependencies with attention mechanism. However, the performance of pure MECRF for Chinese NER is not good. In this paper, we enhance MECRF with Stacked CNNs and gated mechanism to capture better word and sentence representation for Chinese NER. Meanwhile, we combine both character and word information to improve the performance. We further improve the performance by importing common before and common after vocabularies of named entities as well as entity prefix and suffix via feature mining. The BAPS features are then combined with character embedding features to automatically adjust the weight. The model proposed in this research achieve 91.67% tagging accuracy on the online social media data for Chinese person name recognition, and reach the highest F1-score 92.45% for location name recognition and 90.95% overall recall rate in SIGHAN-MSRA dataset.
Keywords: Machine Learning, Named Entity Recognition, Memory Network, Feature Mining
Title:
Discovering the Latent Writing Style from Articles: A Contextualized Feature Extraction Approach
Author:
Yen-Hao Huang, Ting-Wei Liu, Ssu-Rui Lee, Ya-Wen Yu, Wan-Hsuan Lee, Fernando Henrique Calderon Alvarado and Yi-Shin Chen
Abstract:
With the growth of the Internet, the ready accessibility and generation of online information has created the issue of determining how accurate or truthful that information is. The rapid speed of information generation makes the manual filter approach impossible; hence, there is a desire for mechanisms to automatically recognize and filter unreliable data. This research aimed to create a method for distinguishing vendor-sponsored reviews from customer product reviews using real-world online forum datasets. However, the lack of labelled sponsored reviews makes end-to-end training difficult; many existing approaches rely on lexicon-based features that may be easily manipulated by replacing word usages. To avoid this word manipulation, we derived a graph-based method for extracting latent writing style patterns. Thus, this work proposes a Contextualized Affect Representation for Implicit Style Recognition framework, namely CARISR. Transfer learning architecture was also adapted to improve the model�䏭 learning process with weakly labeled data. The proposed approach demonstrated the ability to recognize sponsored reviews through comprehensive experiments using the limited available data with 70% accuracy.
Keywords:
Reliability, Transfer Learning, Writing Style, Text Classification, Natural Language Processing
Title:
An Investigation of Hybrid CTC-Attention Modeling in Mandarin Speech Recognition
Author:
Hsiu-Jui Chang, Wei-Cheng Chao, Tien-Hong Lo, Berlin Chen
Abstract:
The recent emergence of end-to-end automatic speech recognition (ASR) frameworks has streamlined the complicated modeling procedures of ASR systems in contrast to the conventional deep neural network-hidden Markov (DNN-HMM) ASR systems. Among the most popular end-to-end ASR approaches are the connectionist temporal classification (CTC) and the attention-based encoder-decoder model (Attention Model). In this paper, we explore the utility of combining CTC and the attention model in an attempt to yield better ASR performance. we also analyze the impact of the combination weight and the performance of the resulting CTC-Attention hybrid system on recognizing short utterances. Experiments on a Mandarin Chinese meeting corpus demonstrate that the CTC-Attention hybrid system delivers better performance on short utterance recognition in comparison to one of the state-of-the-art DNN-HMM settings, namely, the so-called TDNN-LFMMI system.
Keywords: CTC, Attention-based Encoder-Decoder, End-to-End Mandarin Chinese Speech Recognition, Short Utterance Recognition
��