International Journal of Computational Linguistics & Chinese Language Processing                                   [中æ�]
                                                                                          Vol. 12, No. 1, March 2007


Title:
Differences in the Speaking Styles of a Japanese Male According to Interlocutor; Showing the Effects of Affect in Conversational Speech

Author:
Nick Campbell

Abstract:
There has been considerable interest recently in the processing of affect in spoken interactions. This paper presents an analysis of some conversational speech corpus data showing that the four prosodic characteristics, duration, pitch, power, and voicing all vary significantly according to both interlocutor differences and differences in familiarity over a fixed period of time with the same interlocutor.

Keyword: Conversational Speech Corpus, Expression of Affect, Prosodic Characteristics, Voice Quality Analysis.


Title:
The Breath Segment in Expressive Speech

Author:
Chu Yuan, and Aijun Li

Abstract:
This paper, based on a selected one hour of expressive speech, is a pilot study on how to use breath segments to get more natural and expressive speech. It mainly deals with the status of when the breath segments occur and how the acoustic features are affected by the speaker�䏭 emotional states in terms of valence and activation. Statistical analysis is made to investigate the relationship between the length and intensity of the breath segments and the two state parameters. Finally, a perceptual experiment is conducted by employing the analysis results to synthesized speech, the results of which demonstrate that breath segment insertion can help improve the expressiveness and naturalness of the synthesized speech.

Keyword:
Breath Segment, Expressive Speech, Emotion, Valence, Activation.


Title:
Affective Intonation-Modeling for Mandarin Based on PCA

Author:
Zhuangluan Su, and Zengfu Wang

Abstract:
The speech fundamental frequency (henceforth F0) contour plays an important role in expressing the affective information of an utterance. The most popular F0 modeling approaches mainly use the concept of separating the F0 contour into a global trend and local variation. For Mandarin, the global trend of the F0 contour is caused by the speaker�䏭 mood and emotion. In this paper, the authors address the problem of affective intonation. For modeling affective intonation, an affective corpus has been designed and established, and all intonations are extracted with an iterative algorithm. Then, the concept of eigen-intonation is proposed based on the technique of Principal Component Analysis on the affective corpus and all the intonations are transformed to the lower-dimensional eigen sub-space spanned by eigen-intonations. A model of affective intonations is established in the sub-space. As a result, the corresponding emotion (maybe a mixed emotion) can be expressed by speech whose intonation is modified according to the above model. The experiments are performed with the affective Mandarin corpus, and the experimental results show that the intonation modeling approach proposed in this paper is efficient for both intonation representation and speech synthesis.

Keyword:
Eigen-Intonation, Affective Speech, Mixed Emotion, F0 Contour, Speech Synthesis.


Title:
Manifolds Based Emotion Recognition in Speech

Author:
Mingyu You, Chun Chen, Jiajun Bu, Jia Liu, and Jianhua Tao

Abstract:
The paper presents an emotional speech recognition system with the analysis of manifolds of speech. Working with large volumes of high-dimensional acoustic features, the researchers confront the problem of dimensionality reduction. Unlike classical techniques, such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), a new approach, named Enhanced Lipschitz Embedding (ELE) is proposed in the paper to discover the nonlinear degrees of freedom that underlie the emotional speech corpus. ELE adopts geodesic distance to preserve the intrinsic geometry at all scales of speech corpus. Based on geodesic distance estimation, ELE embeds the 64-dimensional acoustic features into a six-dimensional space in which speech data with the same emotional state are generally clustered around one plane and the data distribution feature is beneficial to emotion classification. The compressed testing data is classified into six emotional states (neutral, anger, fear, happiness, sadness and surprise) by a trained linear Support Vector Machine (SVM) system. Considering the perception constancy of humans, ELE is also investigated in terms of its ability to detect the intrinsic geometry of emotional speech corrupted by noise. The performance of the new approach is compared with the methods of feature selection by Sequential Forward Selection (SFS), PCA, LDA, Isomap and Locally Linear Embedding (LLE). Experimental results demonstrate that, compared with other methods, the proposed system gives 9%-26% relative improvement in speaker-independent emotion recognition and 5%-20% improvement in speaker-dependent recognition. Meanwhile, the proposed system shows robustness and an improvement of approximately 10% in emotion recognition accuracy when speech is corrupted by increasing noise.

Keywords:
Enhanced Lipschitz Embedding (ELE), Dimensionality Reduction, Emotional Speech Analysis, Emotion Recognition.


Title:
Emotion Recognition from Speech Using IG-Based Feature Compensation

Author:
Chung-Hsien Wu, and Ze-Jing Chuang

Abstract:
This paper presents an approach to feature compensation for emotion recognition from speech signals. In this approach, the intonation groups (IGs) of the input speech signals are extracted first. The speech features in each selected intonation group are then extracted. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to characterize feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Minimum Classification Error (MCE) algorithm. For the final emotional state decision, the compensated IG-based feature vectors are used to train the Gaussian Mixture Models (GMMs) and Continuous Support Vector Machine (CSVMs) for each emotional state. For GMMs, the emotional state with the GMM having the maximal likelihood ratio is determined as the final output. For CSVMs, the emotional state is determined according to the probability outputs from the CSVMs. The kernel function in CSVM is experimentally decided as a Radial basis function. A comparison in the experiments shows that the proposed IG-based feature compensation can obtain encouraging performance for emotion recognition.

Keyword:
Emotional Speech, Emotion Recognition, Intonation Group, Feature Compensation.
��


Title:
Emotional Recognition Using a Compensation Transformation in Speech Signal

Author:
Cairong Zou, Yan Zhao, Li Zhao, Wenming Zhen, and Yongqiang Bao

Abstract:
An effective method based on GMM is proposed in this paper for speech emotional recognition; a compensation transformation is introduced in the recognition stage to reduce the influence of variations in speech characteristics and noise. The extraction of emotional features includes the globe feature, time series structure feature, LPCC, MFCC and PLP. Five human emotions (happiness, angry, surprise, sadness and neutral) are investigated. The result shows that it can increase the recognition ratio more than normal GMM; the method in this paper is effective and robust.

Keyword:
Speech Emotional Recognition (SER), GMM, Emotion Recognition, Compensation Transformation.
��


Title:
The Influence of Reading Styles on Accent Assignment in Mandarin

Author:
Mingzhen Bao, Min Chu, and Yunjia Wang

Abstract:
This paper investigates the influences of three different reading styles (Lyric, Critical and Explanatory) to the distribution tendency of sentential accents (classified as rhythmic accent and semantic accent). The comparison among multiple styles is performed in three research domains: high-level constructions, low-level phrases and disyllabic prosodic words. One finds that the assignment of semantic accents shows some differences across reading styles, while the assignment of rhythmic accents does not. Furthermore, the larger the speech unit studied, the stronger the influence is observed, i.e. most differences in the assignment of semantic accents are shown in high-level constructions, some are shown in low-level phrases, and none are shown in prosodic words across the three reading styles. Compared with previous studies, the allocation scheme of semantic accents in the Explanatory style is close to that in the neutral style, i.e. in high-level constructions, it has a final-accented tendency in theme + rheme (TR), predicate + object(PO) and subject + predicate(SP) constructions, and uniform distribution in adjunct + head constructions. In low-level phrases, the Explanatory style exhibits an initial-accented tendency in adjunct + head phrases, but a final-accented tendency in subject + predicate (SP) phrases and predicate + object (PO) phrase. The Critical style is adopted to make comments, where semantic focal points are normally on the core subjects and their actions. As a result, more accents are allocated to the subject part in the AS constructions and to the predicate part in the PO constructions. Accordingly, in low-level phrases, more accents go to the heads in AN phrases and the predicates in SP phrases. The Lyric style helps to express personal emotions in a rhythmic way [Wang 2000]. Such poetry-like rhythm weakens the effect of syntactic constrains, and in many cases, leads to an even distribution of semantic accents in high-level constructions and dense distribution near prosodic boundaries.

Keyword:
Reading Style, Sentential Accent, Distribution Tendency, Mandarin.
��


��