International Journal of Computa

International Journal of Computational Linguistics & Chinese Language Processing []
Vol. 27, No. 1, June 2022

Preface: Corpus Linguistics and Discourse Annotations
Siaw-Fong Chung, Rafal Rzepka and Shih-ping Wang
[pdf | html]
The Uniqueness in Speech: Prosodic Highlights-prompted Information Content Projection in Continuous Speech Speech
Helen Kai-yun Chen and Chiu-yu Tseng
[pdf | html]
Topic Development and Boundary Cues in Hakka Conversational Discourse
Shu-Chuan Tseng and Hsiao-chien Liu
[pdf | html]
A Move Analysis of Communicative Acts in Petition Text on the Public Policy Participation Network Platform
Wei-Ting Yang, Chen-Yu Chester Hsieh, and Siaw-Fong Chung
[pdf | html]
An N-gram Approach to Identifying the Chinese Linguistic Signals for the Problem-Solution Pattern in Annotated Online Health New
Chen-Yu Chester Hsieh and Yu-Yun Chang
[pdf | html]
Let Me Finish!�𤉶peech Patterns of Interruptions in Chinese: A Corpus-based Study on Parliamentary Interpellations on Taiwan
Christian Schmidt and Chia-Rung Lu
[pdf | html]
Constructing a Deep Learning Model Using Language in Social Media: The Case Study of �𤇼etrospective Adjustment��
Ren-feng Duann, Shu-I Chiu, and Hui-Wen Liu
[pdf | html]

Title:
Preface: Corpus Linguistics and Discourse Annotation

Author:
Siaw-Fong Chung, Rafal Rzepka and Shih-ping Wang

Title:
The Uniqueness in Speech: Prosodic Highlights-prompted Information Content Projection in Continuous Speech Speech

Author:
Helen Kai-yun Chen and Chiu-yu Tseng

Abstract:
Recently, it has been identified that perceived prosodic highlights in continuous speech can function alternatively as the projector of key/focal information allocation. This view provides a novel interpretation to the long-held claim that prominence is used predominantly to mark key information and alludes to the significance of information content planning prompted by perceived prominence. Exploring further information content planning and allocation prompted by prosodic highlights, this study focused on the information content planning unit�婙��projector�� (PJR) and its respective �𦑩rojection�� (PJN) (henceforth PJR-PJN units)�𤤗cross four diverse Mandarin speech genres. Using the corpus linguistic approach and quantitative analyses, the current study conducted acoustic correlates analyses of F0 realization and pause duration, also the calculation of emphasis-attributed weighting scores based on emphasis levels consistently annotated in the speech data. While the main goal of the study was to profile consistent acoustic realizations across the PJR-PJN units, further confirmation of the patterned deployment of information content in continuous speech was verified. Ultimately, the current results foregrounded the underlying mechanism for information prosody and features unique to speech.

Keywords: Continuous Speech and Discourse, Spoken Corpora and Annotations, Information Content Planning and Allocation, Prosodic Highlights-prompted Projection, Emphasis-attributed Weighting Scores, Information-attributed Weighting Scores.

Title:
Topic Development and Boundary Cues in Hakka Conversational Discourse

Author:
Shu-Chuan Tseng and Hsiao-chien Liu

Abstract:
The structure of conversational discourse is context-dependent, and the organization of discourse segments and preferences for signaling discourse boundaries are language-specific characteristics. Participating speakers, speaking scenarios, and communication purposes instantaneously affect the conduct of social interaction and verbal exchanges during a conversation. For example, topic maintenance is sustained by the overt exchange of coherent information, and lexical preferences at the boundaries of related discourse segmentation can help construct the course of topic development. Moreover, form-based discourse units are used to represent the content of spoken utterances and to describe the interaction of speakers in conversations. This study investigated topic-specific Hakka conversations using a top-down two-level discourse segmentation approach to examine the development and production of topics. Typical cues and expressions used to initiate topics and subtopics and their respective discourse functions in the Hakka conversations were analyzed. In the Hakka conversational data, noun phrases were preferred at the topic and subtopic transition boundaries, and complete forms such as clausal constructions were also favored, although the spontaneous speech was expected to be fragmentary in terms of syntactic structure.

Keywords:
Conversation, Discourse Units, Topic Development, Boundary Cues, Hakka

Title:
A Move Analysis of Communicative Acts in Petition Text on the Public Policy Participation Network Platform

Author:
Wei-Ting Yang, Chen-Yu Chester Hsieh, and Siaw-Fong Chung

Abstract:
With the rapid development of information technology, the Taiwanese government has launched the Public Policy Network Participation Platform (Join Platform), which allows citizens to start and support a petition online and voice their opinions regarding public issues. The aim of this study was to apply the method of move analysis to investigate the text structure and linguistic features of the online petition genre. In total, 40 online petition texts were collected from the website and compiled into a corpus using the AntConc application. The collected texts were then annotated with reference to the four moves of the Situation, Problem, Solution, and Evaluation textual pattern and the communicative acts in each move. The results showed that the distribution of the moves varied across the articles and that the communicative acts in each move were represented by high-frequency words. The findings of this research will thus serve as a basis for future applications, such as computerized data collection, automatic annotation of rhetorical moves, and judgment of communicative acts in texts.

Keywords:
Move Analysis, SPSE, Policy Argumentation, Communicative Act, Join Platform

Title:
An N-gram Approach to Identifying the Chinese Linguistic Signals for the Problem-Solution Pattern in Annotated Online Health New

Author:
Chen-Yu Chester Hsieh and Yu-Yun Chang

Abstract:
This article will report the results of an exploratory project that combined the annotation of the Problem-Solution (PS) textual pattern in online health news and the quantitative and qualitative methods of corpus linguistics to investigate the linguistic features of particular rhetorical moves. A total of 120 journalistic texts written in Chinese were collected from a Taiwan-based journalistic website that focused on providing news related to health and medicine and were annotated with the four components of the PS pattern. To identify signals in the genre for the elements of the PS move structure, an n-gram approach was then implemented to extract frequent lexicogrammatical sequences from the corpus in general and from the Problem and Response moves in particular. The results showed that the linguistic features found in the retrieved sequences tended to fall within a range of categories, such as abstract nouns, medical terms, and modal verbs, which not only served as functions relevant to the rhetorical move in which they were used but also reflected characteristics specific to the health news genre and the Chinese language. The findings and annotated data generated from the current project will thus provide a solid foundation for future research and applications.

Keywords:
N-gram, Problem-Solution Pattern, Health News, Annotation, Journalistic Discourse

Title:
Let Me Finish!��Speech Patterns of Interruptions in Chinese: A Corpus-based Study on Parliamentary Interpellations on Taiwan

Author:
Christian Schmidt and Chia-Rung Lu

Abstract:
This corpus-based study investigated verbal interruptions during parliamentary interpellations based on official and publicly accessible transcriptions provided by the Legislative Yuan of the Republic of China (Taiwan). While interruptions have previously been understood as organizing turn-taking, as well as cues and speech markers, the results of this study suggest that interruptions have a dual nature. Interruption is incentivised by confrontational discourse strategies and realized by linguistic expressions, some of which are statistically significant and can be called keywords. Using open-source data to explore the linguistic features in the speech patterns of interruptions in institutional discourse, we first identified the word classes and keywords with significant frequency shifts between interrupted, interrupting, and regular sentences. Then, we associated the meanings of the keywords with offensive and defensive discourse strategies. The findings of this study indicate that interrupted sentences were more reflective of defensive discourse strategies, while interrupting sentences were associated with offensive ones. Moreover, conjunctions, adverbs, and pronouns played a more important role in the speech patterns of interruptions compared with their respective footprint in the lexicon. Conversely, nouns and verbs, with some exceptions, as well as adjectives, played a lesser role. We argue that the confrontational incentive structure in institutional debates creates certain linguistic patterns, mostly statistically significant frequency shifts of keywords in interrupted and interrupting sentences, and that these patterns might be useful in explaining interruption.

Keywords:
Speech Patterns, Spoken Chinese, Interruptions, Institutional Discourse

Title:
Constructing a Deep Learning Model Using Language in Social Media: The Case Study of �𤇼etrospective Adjustment��

Author:
Ren-feng Duann, Shu-I Chiu, and Hui-Wen Liu

Abstract:
This research, which used Facebook posts related to the term �禃etrospective adjustment�� in Taiwan as the corpus, manually coded the sentiments of 6,917 posts. Randomly dividing the dataset into two subsets for training (70%) and testing (30%) and using the Chinese pre-trained BERT model as the foundation, we trained and fine-tuned the model with the training dataset and ran the fine-tuned model to predict the sentiments in the test dataset. We then compared the results of the manual coding and model prediction to explain the differences from the perspective of linguistic features. The results indicated that the model performed better for the posts manually coded as �𦨭eutral,�� with an accuracy of 0.81, while the accuracies of model prediction were only 0.64 and 0.63 for the posts manually coded as �𦑩ositive�� and �𦨭egative,�� respectively. Regarding inaccuracy, the posts manually coded as �𦨭egative�� but predicted by the model as �𦑩ositive�� and those manually coded as �𦑩ositive�� but predicted by the model as �𦨭eutral�� ranked the highest (0.23) and the second highest (0.22), respectively. Examining the linguistic features of the two groups of posts, we identified seven categories of linguistic features that, we claim, led to �𦨭egative�� coding and four categories that led to �𦑩ositive�� coding. Moreover, both groups contained posts that could not be coded accurately without knowledge of the news and the Facebook account owners�� political/social inclinations, which was attributed to the posts�� high relatedness to the general public and the politics of Taiwan. Considering that the language used in social media is different from the language employed to train current models, and that Facebook users frequently use punctuation marks and emoticons to express their moods, we argue that there is a need to develop a model for social media.

Keywords:
Social Media, Deep Learning, Retrospective Adjustment, Sentiment Analysis, Natural Language Processing

��