Speech Corpus Tools is an application for working with speech datasets, with a focus on large-scale speech corpora. It uses PolyglotDB as the underlying data storage, which allows for consistent queries across a range of possible input formats. This presentation describes the motivation and design of SCT, as well as its application in a case study of speech from 12 languages.

This site consists of two parts:

  1. Tutorial: a stand-alone page containing full instructions for installation of SCT, and worked examples using a sample dataset, providing an introduction to SCT’s basic functionality. We recommend the tutorial for first-time users of SCT.
  2. Documentation: the rest of the site, beginning at Navigation Tour, provides documentation of SCT’s functionality. The documentation is still in progress (July 2016), and questions not addressed here should be sent to