ACLCLP


Application for Use of Sinica Balanced Corpus


The Sinica Balanced Corpus (Sinica Corpus) is the first balanced Chinese corpus with part-of-speech tagging. The corpus (Sinica 4.0) is open to the research community through the WWW (http://www.sinica.edu.tw/SinicaCorpus/). The size of this corpus is Ten million words. Each text in the corpus is classified and marked according to five criteria: genre, style, mode, topic, and source. The feature values of these classifications are assigned in a hierarchy. Subcorpora can be defined with a specific set of attributes to serve different research purposes. Texts in the corpus are segmented according to the word segmentation standard proposed by the ROC Computational Linguistic Society. Each segmented word is tagged with its part-of-speech. Linguistic patterns and language structures can be extracted from the tagged corpus via a corpus inspection program which can filter the data, generate statistics, sort, and identify collocations. 

Please complete the required documents as below and send them to ACLCLP at the following address:

The Association for Computational Linguistics and Chinese Language Processing
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan


Required documents:

The license fee:



Payment: please fill in the payment form  


Address:1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel:886-2-27881638, Fax:886-2-26519386, E-mail:[email protected]