Introduction

General Background

Speech Corpus Tools is an application for interacting with large scale datasets. It uses PolyglotDB as the underlying data storage, which allows for consistent queries across a wide range of possible input formats.

Speech Corpus Tools is written in Python, which allows for Python scripts to be written using its API, so advanced users can create their own queries using Python, rather than SQL or Cypher (the underlying database languages).

In addition, Speech Corpus Tools provides a graphical user interface for easily displaying annotations and speech in the database and the results of queries.