ACLCLP

Application for Use of Sinica Balanced Corpus

The Sinica Balanced Corpus (Sinica Corpus) is the first balanced Chinese corpus with part-of-speech tagging. The corpus (Sinica 4.0) is open to the research community through the WWW (http://www.sinica.edu.tw/SinicaCorpus/). The size of this corpus is Ten million words. Each text in the corpus is classified and marked according to five criteria: genre, style, mode, topic, and source. The feature values of these classifications are assigned in a hierarchy. Subcorpora can be defined with a specific set of attributes to serve different research purposes. Texts in the corpus are segmented according to the word segmentation standard proposed by the ROC Computational Linguistic Society. Each segmented word is tagged with its part-of-speech. Linguistic patterns and language structures can be extracted from the tagged corpus via a corpus inspection program which can filter the data, generate statistics, sort, and identify collocations.

Please complete the required documents as below and send them to ACLCLP at the following address:

The Association for Computational Linguistics and Chinese Language Processing
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan

Required documents:

An official statement from the applicant's affiliated institution certifying his/her status at this institution. Written statement from the applicant or his/her affiliated institution affirming that the corpus will be used for research only, and not for any commercial purpose.
The original copy of the Agreement. (Please send two copies of the Agreement, one for you and the other for our records.)

The license fee:

Individuals:US$200.-
Nonprofit Institutions(for 2-10 users):US$1,000.-
Nonprofit Institutions(for 11 or more users):US$2,500.-

Payment: please fill in the payment form 　
Address:1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel:886-2-27881638, Fax:886-2-26519386, E-mail:[email protected]