ACLCLP


Application for Use of Speech Database





MAT-160

  • Database Name:MAT-160
  • Speech File Editing Program:VEDITOR 3.0
  • Database Brief (PDF)
The MAT Speech Database, including the speech file editing program, is stored in 1 DVD-ROM(s).

The MAT Speech Databse (MATDB) is the result of the research program subsidized by the National Science Concil of the Executive Yuan, and ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the License Agreement. (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: US$20.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




MAT-400

  • Database Name:MAT-400
  • Speech File Editing Program:VEDITOR 4.0
  • Database Brief (PDF)
The MAT Speech Database, including the speech file editing program, is stored in 1 DVD-ROM(s).

The MAT Speech Databse (MATDB) is the result of the research program subsidized by the National Science Concil of the Executive Yuan, and ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the License Agreement. (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: US$30.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




MAT-2000Edu

  • Database Name:MAT-2000Edu
  • Speech File Editing Program:VEDITOR 4.1p
  • Database Brief (PDF)
The MAT Speech Database, including the speech file editing program, is stored in 2 DVD(s).

The MAT Speech Database (MATDB) is the result of the research program subsidized by the National Science Council of the Executive Yuan, and ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the License Agreement. (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: US$700.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




MAT-2000Com

  • Database Name:MAT-2000Com
  • Speech File Editing Program:VEDITOR 4.1p
  • Database Brief (PDF)
The MAT Speech Database, including the speech file editing program, is stored in 2 DVD(s).

The MAT Speech Database (MATDB) is the result of the research program subsidized by the National Science Council of the Executive Yuan, and ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the License Agreement. (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: US$3,500.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




MAT-2500ExtV-Edu

  • Database Name:MAT-2500ExtV-Edu
  • Speech File Editing Program:VEDITOR, VAT2WAV
  • Database Brief (PDF)
The MAT Speech Database, including the speech file editing program, is stored in 2 DVD(s).

The MAT Speech Database (MATDB) is the result of the research program subsidized by the National Science Council of the Executive Yuan, and ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the License Agreement. (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: US$350.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




MAT-2500ExtV-Com

  • Database Name:MAT-2500ExtV-Com
  • Speech File Editing Program:VEDITOR, VAT2WAV
  • Database Brief (PDF)
The MAT Speech Database, including the speech file editing program, is stored in 1 DVD(s).

The MAT Speech Database (MATDB) is the result of the research program subsidized by the National Science Council of the Executive Yuan, and ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the License Agreement. (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: US$3,500.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




TCC-300Edu

  • Database Name: TCC-300Edu
  • Speech File Editing Program: VEDITOR 5.0p
  • Database Brief(PDF)
The Microphone Speech Database, including the speech file editing program, is stored in 1 DVD.

This is a collection of microphone speech databases produced by National Taiwan University, National Cheng Kung University, and National Chiao Tung University. ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the License Agreement. (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: US$50.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




TCC-300Com

  • Database Name: TCC-300Com
  • Speech File Editing Program: VEDITOR 5.0p
  • Database Brief(PDF)
The Microphone Speech Database, including the speech file editing program, is stored in 1 DVD.

This is a collection of microphone speech databases produced by National Taiwan University, National Cheng Kung University, and National Chiao Tung University. ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the License Agreement. (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: US$3,500.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




EAT-ALL

EAT corpus containing three groups of channels: PSTN, MIC16K and GSM was stored in three DVD discs. PSTN and GSM corpora were stored in the same DVD disc which is label as “PSTN +GSM”. Because the sampling rate of MIC16K speech data was high, the resulting storage requirement was huge. We stored MIC16K speech in two DVD discs labeled by “Mic16K English” and “Mic16K NonEnglish” for English Department and non-English Department, respectively.
The English Across Taiwan (EAT) was developed jointly by the Association of Computational Linguistics and Chinese Language Processing.Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the license agreement Non-profit Version; Commercial Version (one kept by ACLCLP, and the other kept by the applicant).
  3. Price:
    • Non-profit organizations:USD$ 1,350.-
    • Commercial organizations:USD$ 13,500.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




EAT-200

EAT corpus containing three groups of channels: PSTN, MIC16K and GSM was stored in one DVD discs. PSTN and GSM corpora were stored in the same DVD disc which is label as “PSTN +GSM”. Because the sampling rate of MIC16K speech data was high, the resulting storage requirement was huge. We stored MIC16K speech in two DVD discs labeled by “Mic16K English” and “Mic16K NonEnglish” for English Department and non-English Department, respectively.
The English Across Taiwan (EAT) was developed jointly by the Association of Computational Linguistics and Chinese Language Processing.Applicants are supposed to apply by signing the license agreement and complying to the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the license agreement Non-profit Version; Commercial Version (one kept by ACLCLP, and the other kept by the applicant).
  3. Price:
    • Non-profit organizations:USD$ 350.-
    • Commercial organizations:USD$ 3,500.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




MATBN

The MATBN Mandarin Chinese broadcast news corpus is a product of a joint project sponsored by the National Science Council, Taiwan. It contains a total of 198 one-hour news shows from the Public Television Service Foundation, Taiwan with corresponding transcripts. The primary purpose of this collection is to provide training and testing data for continuous speech recognition evaluation in the broadcast news domain. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two original copies of the license agreement (Download agreement) (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: USD$ 1,350.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




COSPRO & Toolkit

The Sinica COSPRO (Mandarin Continuous Speech Prosody Corpora) and Toolkit is designed, collected and annotated by Dr. Chiu-yu Tseng and her research group at the Phonetics Lab, Institute of Linguistics, Academia Sinica, Taipei, Taiwan. The package of 4 DVD’s contains 10.5 GB (7.7 GB annotated) of speech corpora and the Toolkit. Funding resources for corpus collection and toolkit development came exclusively from Academia Sinica, mainly under the support of three Academia Sinica interdisciplinary Theme Projects, “Collaborating Researches on Chinese Information Processing-Subproject on Mandarin Chinese Speech Database (1994.7-1999.7)”, “Knowledge Representation and Language Engineering for Mandarin Chinese --- Man-machine Voice Interface Environment and Its Tools (1997.7—2002.6)” and “New Directions for Mandarin Speech Synthesis : From Prosodic Organization to More Natural Output (January 2003—December 2005). ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying with the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the Licensing Agreement (one kept by ACLCLP, and the other kept by the applicant).
  3. Price:Nonprofit overseas institution: US$100.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




AESOP-ILAS (Asian English Speech cOrpus Project - Institute of Linguistics, Academia Sinica) Corpora

  • Database Name:AESOP-ILAS (Asian English Speech cOrpus Project - Institute of Linguistics, Academia Sinica) Corpora
  • Database Brief

The AESOP-ILAS speech corpus is especially designed for the Taiwan division of the multinational research project AESOP (Asian English Speech Corpus Project), featuring L2 English speech by native speakers of Taiwan Mandarin. The principal investigator of this project is Dr. Chiu-yu TSENG, Distinguished Research Fellow and Director of the Institute of Linguistics, Academia Sinica. The project aims to build up a corpus of the English spoken in Taiwan as an open resource and to investigate a wide range of communicative phonetic and prosodic features in Taiwan L2 English at the segmental, lexical, phrasal, and discourse levels, rather than focusing on specific and individual phenomena. It should be useful for research and development in language teaching, language modeling, phonetic research and applications to speech synthesis and recognition.

AESOP-ILAS is released in April, 2015 for use of non-commercial academic research only. ACLCLP is authorized to release it. Applicants are supposed to apply by signing the license agreement and complying with the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the Licensing Agreement (one kept by ACLCLP, and the other kept by the applicant).
  3. Price:Nonprofit overseas institution: US$100.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




Sinica MCDC8

Sinica MCDC8 includes the sound files (.wav) and transcripts of eight Mandarin Chinese conversations in .TxetGrid format (PRAAT) with signal-aligned time information. For details, please visit the Spoken Mandarin Resource and Research website (http://mmc.sinica.edu.tw/). Sinica MCDC8 is the result of several research projects funded by Academia Sinica, and the ACLCLP is authorized to release it. Applicants should apply by signing the license agreement and complying with the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution.
  2. Three(3) original copies of the Licensing Agreement.
  3. Price
    • Nonprofit overseas academic institutions
      • ACLCLP members US$2,000.-
      • Non-members US$2,100.-
    • Other overseas organizations
      • ACLCLP members US$6,000.-
      • Non-members US$6,400.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




Sinica Phone-aligned Chinese Conversational Speech Database

Sinica Phone-aligned Chinese Conversational Speech Database consists of 3.5 hours of Chinese conversational speech produced by 16 speakers, totalling 1 GB of speech data. This database is part of the Sinica MCDC8 Corpus. The alignment information includes SYLLABLE and PHONE in .TextGrid format (PRAAT), verified by professional phonetic labellers. For details, please visit the Spoken Mandarin Resource and Research website (http://mmc.sinica.edu.tw/).
Sinica Phone-aligned Chinese Conversational Speech Database is the result of several research projects funded by Academia Sinica. The ACLCLP is authorized to release it. Applicants should apply by signing the license agreement and complying with the terms on the license agreement. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution.
  2. Three(3) original copies of the Licensing Agreement.
  3. Price
    • Nonprofit overseas academic institutions
      • ACLCLP members US$1,500.-
      • Non-members US$1,550.-
    • Other overseas organizations
      • ACLCLP members US$15,000.-
      • Non-members US$15,100.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




National Education Radio Corpus (NER)

  • Database Name:National Education Radio Corpus (NER)
  • Database Brief (PDF)
Taiwanese Mandarin has many notable differences from Putonghua in China, such as writing system, pronunciation, accent, wording, and vocabulary. Many of the differences can be attributed to the influences from Taiwanese, Hakka, Formosan, Dutch and Japanese languages. Therefore, it is well understood that a Taiwanese-specific automatic speech recognition (ASR) system is required for better speech-enabled human-computer interaction in Taiwanese people’s daily life.
Therefore, we had built the National Education Radio (NER) corpus which is a real-life, multi-genre and spontaneous Taiwanese Mandarin broadcast speech corpus with manual transcription from the digital archive of Taiwan’s National Education Radio. NER is the largest Taiwanese Mandarin spoken corpus that has 21-volume, 3200-hour speech data. Besides, it is also the largest Chinese spoken text (instead of writing text) database with about 60 million traditional Chinese characters. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two original copies of the license agreement (Download agreement) (one kept by ACLCLP, and the other kept by the applicant).
  3. Price: USD$35.-


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  




Taiwanese Across Taiwan Corpus (TAT)

  • Database Name:Taiwanese Across Taiwan Corpus (TAT)
  • Database Brief (PDF)
TAT (Taiwanese across Taiwan) corpus is a large-scale multi-channel read-speech corpus recorded across Taiwan using 6 different microphones in quite office-like environment. It contains 300 hours x 6 channels speech produced by 600 speakers. The first two volumes of TAT corpus, TAT-Vol1 and TAT-Vol2, in total 200 speakers, about 100 hours, have been well-transcribed and therefore publicly released. 

Documents required:

  1. A certificate from the applicant's affiliated institution indicating his/her status at this institution
  2. Two(2) original copies of the license agreement Non-profit Version; Commercial Version (one kept by ACLCLP, and the other kept by the applicant).
  3. Price:
    • Non-profit organizations:
      • TAT-Vol1 US$1,500.-
      • TAT-Vol2 US$1,500.-
      • TAT-TTS-M1 US$1,500.-
      • TAT-TTS-M2 US$1,500.-
      • TAT-TTS-F1 US$1,500.-
      • TAT-TTS-F2 US$1,500.-
    • Commercial organizations: (SUSPENDED)
      • TAT-Vol1
      • TAT-Vol2
      • TAT-TTS-M1
      • TAT-TTS-M2
      • TAT-TTS-F1
      • TAT-TTS-F2


Please send the documents to:

The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel: +886-2-2788-1638
Fax: +886-2-26519386
E-Mail: [email protected] 

Payment: please fill in the payment form  


Address:1F., No. 34, Ln. 3, Sec. 1, Jiuzhuang St., Nankang Dist., Taipei City, 115022, Taiwan
Tel:886-2-27881638, Fax:886-2-26519386, E-mail:[email protected]