International Journal of Computa

International Journal of Computational Linguistics & Chinese Language Processing [中��]
Vol. 25, No. 1, June 2020

Chinese Spelling Check based on Neural Machine Translation
Jhih-Jie Chen, Hai-Lun Tu, Ching-Yu Yang, Chiao-Wen Li and Jason S. Chang
[pdf | html]
Spoken Document Summarization Using End-to-End Modeling Techniques
Tzu-En Liu, Shih-Hung Liu, Kuo-Wei Chang, and Berlin Chen
[pdf | html]
Rumor Detection Using Deep Attention Networks With Multimodal Feature Fusion
Jenq-Haur Wang and Chin-Wei Huang
[pdf | html]
Linguistic Input and Child Vocalization of 7 Children from 5 to 30 Months: A Longitudinal Study with LENA Automatic Analysis
Chia-Cheng Lee, Li-mei Chen, and D. Kimbrough Oller
[pdf | html]
A Research of Applying Multi-hop Attention and Memory Relations on Memory Networks
Jing-Han Zhan, Alan Liu, and Chiung-Hon Lee
[pdf | html]

Title:
Chinese Spelling Check based on Neural Machine Translation

Author:
Jhih-Jie Chen, Hai-Lun Tu, Ching-Yu Yang, Chiao-Wen Li and Jason S. Chang

Abstract:
We present a method for Chinese spelling check that automatically learns to correct a sentence with potential spelling errors. In our approach, a character-based neural machine translation (NMT) model is trained to translate the potentially misspelled sentence into correct one, using right-and-wrong sentence pairs from newspaper edit logs and artificially generated data. The method involves extracting sentences contain edit of spelling correction from edit logs, using commonly confused right-and-wrong word pairs to generate artificial right-and-wrong sentence pairs in order to expand our training data , and training the NMT model. The evaluation on the United Daily News (UDN) Edit Logs and SIGHAN-7 Shared Task shows that adding artificial error data can significantly improve the performance of Chinese spelling check system.

Keywords: Chinese Spelling Check, Artificial Error Generation, Neural Machine Translation, Edit Log

Title:
Spoken Document Summarization Using End-to-End Modeling Techniques

Author:
Tzu-En Liu, Shih-Hung Liu, Kuo-Wei Chang, and Berlin Chen

Abstract:
This thesis set to explore novel and effective end-to-end extractive methods for spoken document summarization. To this end, we propose a neural summarization approach leveraging a hierarchical modeling structure with an attention mechanism to understand a document deeply, and in turn to select representative sentences as its summary. Meanwhile, for alleviating the negative effect of speech recognition errors, we make use of acoustic features and subword-level input representations for the proposed approach. Finally, we conduct a series of experiments on the Mandarin Broadcast News (MATBN) Corpus. The experimental results confirm the utility of our approach which improves the performance of state-of-the-art ones.

Keywords:
Spoken Documents, Extractive Summarization, Deep Neural Networks, Hierarchical Semantic Representations, Acoustic Features

Title:
Rumor Detection Using Deep Attention Networks With Multimodal Feature Fusion

Author:
Jenq-Haur Wang and Chin-Wei Huang

Abstract:
With the rapid growth of information, browsing social media on the Internet is becoming a part of people�䏭 daily lives. Social platforms give us the latest information in real time, for example, sharing personal life and commenting on social events. However, with the vigorous development of social platforms, lots of rumors and fake messages are appearing on the Internet. Most of the social platforms use manual reporting or statistics to distinguish rumors, which are very inefficient. In this paper, we propose a multimodal feature fusion approach to rumor detection by combining image captioning model with deep attention networks. First, for images extracted from tweets, we apply Image Caption model to generate captions by Convolutional Neural Networks (CNNs) and Sequence-to-Sequence (Seq2Seq) model. Second, words in captions and text contents from tweets are represented as vectors by word embedding models and combined with social features in tweets with early and late fusion strategies. Finally, we design Multi-layer and Multi-cell Bi-directional Recurrent Neural Networks (BRNNs) with attention mechanism to find word dependency and learn the most important features for classification. From the experimental results, the best F-measure of 0.89 can be obtained for our proposed Multi-cell BRNN based on Gated Recurrent Units (GRUs) with attention using early fusion of all features except for user features. This shows the potential of our proposed approach to rumor detection. Further investigation is needed for data in larger scales.

Keywords:
Rumor Detection, Bi-directional Recurrent Neural Networks, Gated Recurrent Unit, Self-attention Mechanism, Multimodal Feature Fusion

Title:
Linguistic Input and Child Vocalization of 7 Children from 5 to 30 Months: A Longitudinal Study with LENA Automatic Analysis

Author:
Chia-Cheng Lee, Li-mei Chen, and D. Kimbrough Oller

Abstract:
This study examined longitudinal changes in linguistic input, conversational turns, and child vocalizations in Chinese-speaking families using the computerized LENA (Language Environment Analysis) software, a system that captures audio data in children�䏭 natural environment and parses out speech data automatically. All-day home recordings (11-16 hours) from seven typically developing Chinese-learning children (two males and five females) at the ages of 5, 10, 14, 21, and 30 months were analyzed. Adult word count (AWC), conversational turn count (CT), and child vocalization count (CV) of 70 recordings (i.e., 7 children x 5 ages x 2 recordings) were retrieved from the LENA software. These recordings included times when families were asleep. As a result, the present study also compared the results with and without LENA-determined silence time (i.e., quiet and sleep time). The results showed that the percentage of silence in the recordings decreased with age, indicating that the children�䏭 awake time increased as they age. When the children were awake, they listened to an average of 1734 adult words, engaged in 39 conversational turns, and produced 150 vocalizations per hour from 5 to 30 months of age. The CV and CT increased with age, while the AWC did not show a clear pattern, which was similar to English normative estimates from Gilkerson and Richards (2008). The CT was also found to be a more effective contributor to the number of CV than AWC, indicating that speech produced in temporal proximity to children�䏭 vocalizations or directed to children played an important role in eliciting child vocalizations.

Keywords:
LENA, Adult Word, Conversational Turn, Child Vocalization, Longitudinal Study, Cross-language Comparison

Title:
A Research of Applying Multi-hop Attention and Memory Relations on Memory Networks

Author:
Jing-Han Zhan, Alan Liu, and Chiung-Hon Lee

Abstract:
With the rapid advancement of machine learning and deep learning, a great breakthrough has been achieved in many areas of natural language processing in recent years. Complex language tasks, such as article classification, abstract extraction, question answering, machine translation, and image description generation, have been solved by neural networks. In this paper, we propose a new model based on memory networks to include a multi-hop mechanism to process a set of sentences in small quantity, and the question-answering task is used as the verification application. The model saves the knowledge in memory first and then finds the relevant memory through the attention mechanism, and the output module reasons the final answer. All experiments have used the bAbI dataset provided by Facebook. There are 20 different kinds of Q&A tasks in the data set that can be used to evaluate the model in different aspects. This approach reduces the number of memory associations through the calculation of associations between memories. In addition to reducing the calculation weight of 26.8%, it can also improve the accuracy of the model, which can increase by about 9.2% in the experiment. The experiments also used a smaller amount of data to verify the system for improving the case of insufficient data set.

Keywords:
Memory Networks, Multi-hop Networks, Relation Networks, Attention Mechanism