International Journal of Computa

International Journal of Computational Linguistics & Chinese Language Processing [銝剜�]
Vol. 22, No. 2, December 2017

On the Use of Neural Network Modeling Techniques for Spoken Document Retrieval
Tien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang and Berlin Chen
[pdf | html]
Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering
Sam Weng, Chun-Kai Wu, Yu-Chun Wang and Richard Tzong-Han Tsai
[pdf | html]
Exploring the Use of Neural Network based Features for Text Readability Classification
Hou-Chiang Tseng, Berlin Chen and Yao-Ting Sung
[pdf | html]
Acoustic Echo Cancellation Using an Improved Vector-Space-Based Adaptive Filtering Algorithm
Jin Li-You, Yu Tsao and Ying-Ren Chien
[pdf | html]
A Replay Spoofing Detection System Based on Discriminative Autoencoders
Chia-Lung Wu, Hsiang-Ping Hsu, Yu-Ding Lu, Yu Tsao, Hung-Shin Lee and Hsin-Min Wang
[pdf | html]
A Knowledge Representation Method to Implement A Taiwanese Tone Group Parser
Yu-Chu Chang
[pdf | html]
A Novel Trajectory-based Spatial-Temporal Spectral Features for Speech Emotion Recognition
Chun-Min Chang, Wei-Cheng Lin and Chi-Chun Lee
[pdf | html]

Title:
On the Use of Neural Network Modeling Techniques for Spoken Document Retrieval

Author:
Tien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang and Berlin Chen

Abstract:
Due to ever-increasing amounts of publicly available multimedia associated with speech information, spoken document retrieval (SDR) has been an active area of research that captures significant interest from both academic and industrial communities. Beyond the continuing effort in the development of robust indexing and effective retrieval methods to quantify the relevance degree between a pair of query and spoken document, how to accurately and efficiently model the query content plays a vital role for improving SDR performance. In view of this, we present in this paper a novel neural relevance-aware model (NRM) to infer an enhanced query representation, extricating the conventional time-consuming pseudo-relevance feedback (PRF) process. In addition, we incorporate the notion of query intent classification into our proposed NRM modeling framework to obtain more sophisticated query representations. Preliminary experiments conducted on the TDT-2 collection confirm the utility of our methods in relation to a few state-of-the-art ones

Keywords: Spoken Document Retrieval, Query Intent, Neural Network, Pseudo-Relevance Feedback

Title:
Question Retrieval with Distributed Representations and Participant Reputation in Community Question Answering

Author:
Sam Weng, Chun-Kai Wu, Yu-Chun Wang and Richard Tzong-Han Tsai

Abstract:
In recent years, community-based question and answer (CQA) sites have grown rapidly in number and size. These sites represent a valuable source of online knowledge; however, they often suffer from the problem of duplicate questions. The task of question retrieval (QR) aims to find previously answered semantically similar questions in CQA archives. Nevertheless, synony- mous lexical variations pose a big challenge for question retrieval. Some QR approaches address this issue by calculating the probability of correlation between new questions and archived questions. Much recent research has also focused on surface string similarity among questions. In this paper, we propose a method that first builds a continuous bag-of-words (CBoW) model with data from Asus�䏭 Republic of Gamers (ROG) forum and then determines the similarity between a given new question and the Q&As in our database. Unlike most other methods, we calculate the similarity between the given question and the archived questions and descriptions separately with two different features. In addition, we factor user reputation into our ranking model. Our experimental results on the ROG forum dataset show that our CBoW model with reputation features outperforms other top methods.

Keywords:
Question Retrieval, QR, Community-based Question and Answer, CQA

Title:
Exploring the Use of Neural Network based Features for Text Readability Classification

Author:
Hou-Chiang Tseng, Berlin Chen and Yao-Ting Sung

Abstract:
Text readability refers to the degree to which a text can be understood by its readers: the higher the readability of a text for readers, the better the the comprehension and learning retention can be achieved. In order to facilitate readers to digest and comprehend documents, researchers have long been developing readability models that can automatically and accurately estimate text readability. Conventional approaches to readability classification is to infer a readability model using a set of handcrafted features defined a priori and computed from the training documents, along with the readability levels of these documents. However, the use of handcrafted features requires special expertise and its applicability also is limited. With the recent advance of representation learning techniques, we can efficiently extract salient features from dcouments without recourse to specialized expertise, which offers a promising avenue of research on readability classification. In view of this, we in this paper propose two novel readability models built on top of a convolutional neural network based representation and the so-called fast text representation, respectively, which have the capability of effectively analyzing documents belonging to different domains and covering a wide variety of topics. A series of emperical experiments seem to demonstrate the utility of the proposed models in relation to several existing methods.

Keywords:
Readability, Word Vector, Convolutional Neural Network, Representation Learning, Fast Text.

Title:
Acoustic Echo Cancellation Using an Improved Vector-Space-Based Adaptive Filtering Algorithm

Author:
Jin Li-You, Yu Tsao and Ying-Ren Chien

Abstract:
To eliminate acoustic echo, the convergence rate and low residual echo are very important to adaptive echo cancelers. Meanwhile, an affordable computational complexity has to be considered as well. In this paper, we proposed the improved vector space adaptive filter (IVAF)and Improved Vector-space Affine Projection Sign Algorithm (IVAPSA). The proposed can be divided into two phases: offline and online. In the offline phase, IVAF constructs a vector space to incorporate the prior knowledge of adaptive filter coefficients from a wide range of different channel characteristics. Then, in the online phase, the IVAF combines the conventional APSA and IVAPSA algorithms, where IVAPSA computes the filter coefficients based on the vector space obtained in the offline phase. By leveraging the constructed vector space, the proposed IVAF is able to fast converge and achieve a better echo return loss enhancement performance. Moreover, the computational complexity is less than a comparable work.

Keywords:
Acoustic echo cancellation, Adaptive Filter, Vector-space Adaptive Filter, Machine Learning, Combined Algorithm, Affine Projection Sign Algorithm

Title:
A Replay Spoofing Detection System Based on Discriminative Autoencoders

Author:
Chia-Lung Wu, Hsiang-Ping Hsu, Yu-Ding Lu, Yu Tsao, Hung-Shin Lee and Hsin-Min Wang

Abstract:
In this paper, we propose a discriminative autoencoder (DcAE) neural network model to the replay spoofing detection task, where the system has to tell whether the given utterance comes directly from the mouth of a speaker or indirectly through a playback. The proposed DcAE model focuses on the midmost (code) layer, where a speech utterance is factorized into distinct components with respect to its true label (genuine or spoofed) and meta data (speaker, playback, and recording devices, etc.). Moreover, the concept of modified hinge loss is introduced to formulate the cost function of the DcAE model, which ensures that the utterances with the same speech type or meta information will share similar identity codes (i-codes) and higher similarity score computed by their i-codes. Tested on the development set provided by ASVspoof 2017, our system achieved a much better result, up to 42% relative improvement in the equal error rate (EER) over the official baseline based on the standard GMM classifier.

Keywords:
Speaker Verification, Speakser Verification Attack, Spoofing Attack, Discriminative Autoencoder, Deep Neural Network

Title:
A Knowledge Representation Method to Implement A Taiwanese Tone Group Parser

Author:
Yu-Chu Chang

Abstract:
A tone group parser could be one of the most important components of the Taiwanese text-to-speech system. In this paper, we offered the hypothesis of tonal government to emphasis the idea that if the allotone selection can be made for each word in a sentence then the tone groups will be separated within the sentence and supported our viewpoint with the implementation of a Taiwanese tone group parser. In addition to the description of using the symbol system to convert language expertise and heuristic knowledge into a knowledge base to cope with a frame-based corpus and a tone sandhi processor, the procedure of connecting the inference engine and the knowledge base to make allotone selection was also discussed. In the current version of the tone group parser, the average accuracy of inside test is 98.5%. The average accuracy of outside test is 94%. The experiment data of the study also reveals an important clue: the marking of the symbol system makes a higher contribution rate to the tone sandhi accuracy than the rule inference.

Keywords:
Taiwanese, Tone Sandhi, Tone Group Parser, Knowledge Representation, Simulation

Title:
A Novel Trajectory-based Spatial-Temporal Spectral Features for Speech Emotion Recognition

Author:
Chun-Min Chang, Wei-Cheng Lin and Chi-Chun Lee

Abstract:
Speech is one of the most natural form of human communication. Recognizing emotion from speech continues to be an important research venue to advance human-machine interface design and human behavior understanding. In this work, we propose a novel set of features, termed trajectory-based spatial-temporal spectral features, to recognize emotions from speech. The core idea centers on deriving descriptors both spatially and temporally on speech spectrograms over a sub-utterance frame (e.g., 250ms) - an inspiration from dense trajectory-based video descriptors. We conduct categorical and dimensional emotion recognition experiments and compare our proposed features to both the well-established set of prosodic and spectral features and the state-of-the-art exhaustive feature extraction. Our experiment demonstrate that our features by itself achieves comparable accuracies in the 4-class emotion recognition and valence detection task, and it obtains a significant improvement in the activation detection. We additionally show that there exists complementary information in our proposed features to the existing acoustic features set, which can be used to obtain an improved emotion recognition accuracy.

Keywords:
Emotion Recognition, Speech Processing, Spatial-Temporal Descriptors, Mel-Filter Bank Energy

��