International Journal of Computational Linguistics & Chinese Language
Processing
[䏿�]
Vol. 25, No. 1, June 2020
Title:
Chinese Spelling Check based on Neural Machine Translation
Author:
Jhih-Jie Chen, Hai-Lun Tu,
Ching-Yu
Yang, Chiao-Wen Li and
Jason
S. Chang
Abstract:
We present a method for Chinese spelling check that automatically learns to correct a sentence with potential spelling errors. In our approach, a character-based neural machine translation (NMT) model is trained to translate the potentially misspelled sentence into correct one, using right-and-wrong sentence pairs from newspaper edit logs and artificially generated data. The method involves extracting sentences contain edit of spelling correction from edit logs, using commonly confused right-and-wrong word pairs to generate artificial right-and-wrong sentence pairs in order to expand our training data , and training the NMT model. The evaluation on the United Daily News (UDN) Edit Logs and SIGHAN-7 Shared Task shows that adding artificial error data can significantly improve the performance of Chinese spelling check system.
Keywords: Chinese Spelling Check, Artificial Error Generation, Neural Machine Translation, Edit Log
Title:
Spoken Document Summarization Using End-to-End Modeling Techniques
Author:
Tzu-En Liu, Shih-Hung Liu, Kuo-Wei Chang, and Berlin Chen
Abstract:
This thesis set to explore novel and effective end-to-end extractive methods for spoken document summarization. To this end, we propose a neural summarization approach leveraging a hierarchical modeling structure with an attention mechanism to understand a document deeply, and in turn to select representative sentences as its summary. Meanwhile, for alleviating the negative effect of speech recognition errors, we make use of acoustic features and subword-level input representations for the proposed approach. Finally, we conduct a series of experiments on the Mandarin Broadcast News (MATBN) Corpus. The experimental results confirm the utility of our approach which improves the performance of state-of-the-art ones.
Keywords:
Spoken Documents, Extractive Summarization, Deep Neural Networks, Hierarchical Semantic Representations, Acoustic Features
Title:
Rumor Detection Using Deep Attention Networks With Multimodal Feature Fusion
Author:
Jenq-Haur Wang and Chin-Wei Huang
Abstract:
With
the rapid growth of information, browsing social media on the Internet is
becoming a part of people�䏭 daily lives. Social platforms give us the latest
information in real time, for example, sharing personal life and commenting on
social events. However, with the vigorous development of social platforms, lots
of rumors and fake messages are appearing on the Internet. Most of the social
platforms use manual reporting or statistics to distinguish rumors, which are
very inefficient. In this paper, we propose a multimodal feature fusion approach
to rumor detection by combining image captioning model with deep attention
networks. First, for images extracted from tweets, we apply Image Caption model
to generate captions by Convolutional Neural Networks (CNNs) and
Sequence-to-Sequence (Seq2Seq) model. Second, words in captions and text
contents from tweets are represented as vectors by word embedding models and
combined with social features in tweets with early and late fusion strategies.
Finally, we design Multi-layer and Multi-cell Bi-directional Recurrent Neural
Networks (BRNNs) with attention mechanism to find word dependency and learn the
most important features for classification. From the experimental results, the
best F-measure of 0.89 can be obtained for our proposed Multi-cell BRNN based on
Gated Recurrent Units (GRUs) with attention using early fusion of all features
except for user features. This shows the potential of our proposed approach to
rumor detection. Further investigation is needed for data in larger scales.
Keywords:
Rumor
Detection, Bi-directional Recurrent Neural Networks, Gated Recurrent Unit,
Self-attention Mechanism, Multimodal Feature Fusion
Title:
Linguistic Input and Child Vocalization of 7 Children from 5 to 30 Months: A Longitudinal Study with LENA Automatic Analysis
Author:
Chia-Cheng Lee, Li-mei Chen, and D. Kimbrough Oller
Abstract:
This study examined longitudinal changes in linguistic input, conversational turns, and child vocalizations in Chinese-speaking families using the computerized LENA (Language Environment Analysis) software, a system that captures audio data in children�䏭 natural environment and parses out speech data automatically. All-day home recordings (11-16 hours) from seven typically developing Chinese-learning children (two males and five females) at the ages of 5, 10, 14, 21, and 30 months were analyzed. Adult word count (AWC), conversational turn count (CT), and child vocalization count (CV) of 70 recordings (i.e., 7 children x 5 ages x 2 recordings) were retrieved from the LENA software. These recordings included times when families were asleep. As a result, the present study also compared the results with and without LENA-determined silence time (i.e., quiet and sleep time). The results showed that the percentage of silence in the recordings decreased with age, indicating that the children�䏭 awake time increased as they age. When the children were awake, they listened to an average of 1734 adult words, engaged in 39 conversational turns, and produced 150 vocalizations per hour from 5 to 30 months of age. The CV and CT increased with age, while the AWC did not show a clear pattern, which was similar to English normative estimates from Gilkerson and Richards (2008). The CT was also found to be a more effective contributor to the number of CV than AWC, indicating that speech produced in temporal proximity to children�䏭 vocalizations or directed to children played an important role in eliciting child vocalizations.
Keywords:
LENA, Adult Word, Conversational Turn, Child Vocalization, Longitudinal Study, Cross-language Comparison
Title:
A Research of Applying Multi-hop Attention and Memory Relations on Memory Networks
Author:
Jing-Han Zhan, Alan Liu, and Chiung-Hon Lee
Abstract:
With the rapid advancement of machine learning and deep learning, a great breakthrough has been achieved in many areas of natural language processing in recent years. Complex language tasks, such as article classification, abstract extraction, question answering, machine translation, and image description generation, have been solved by neural networks. In this paper, we propose a new model based on memory networks to include a multi-hop mechanism to process a set of sentences in small quantity, and the question-answering task is used as the verification application. The model saves the knowledge in memory first and then finds the relevant memory through the attention mechanism, and the output module reasons the final answer. All experiments have used the bAbI dataset provided by Facebook. There are 20 different kinds of Q&A tasks in the data set that can be used to evaluate the model in different aspects. This approach reduces the number of memory associations through the calculation of associations between memories. In addition to reducing the calculation weight of 26.8%, it can also improve the accuracy of the model, which can increase by about 9.2% in the experiment. The experiments also used a smaller amount of data to verify the system for improving the case of insufficient data set.
Keywords:
Memory Networks, Multi-hop Networks, Relation Networks, Attention Mechanism