Annotating audio and developing a writing system with Speech Analyzer

Many languages don’t have an alphabet. When speakers of those languages want to write things down, they need to come up with a writing system that is suited to the specific features of their language, as well as considering wider social and political factors.

Linguistic analysis is a key part of developing an unwritten language and coming up with a suitable writing system. Literacy initiatives then engage with speakers throughout the language community, as they begin to interact with the writing system and provide feedback to the iterative orthography development process.

A small team of language experts may take years to collect and analyze data from speakers of the language. This is why many tools are being developed that make it easier for linguists to effectively extract useful linguistic information from speech data.

What are some ways a computer can help?

Computers can help us study a language and figure out the best way to write the sounds and words of that language. One significant task is identifying the major phonemes, which require symbols in the orthography, as well as researching whether and how tonal variation should be represented in the alphabet.

By augmenting some of the analysis steps, we hope to make it easier for linguists and language community members to use speech and other language data to inform their decisions.

Learn how to analyze phonetic data with Speech Analyzer:

After downloading speech analyzer, you can import audio files in the following formats:

  • WAV (.wav)
  • Other audio (.mp3, .wma)
  • Speech Analyzer (.saxml)
  • Speech Analyzer Workbench (.wb)
  • ELAN (.eaf)

You can then make use of Speech Analyzer’s audio analysis tools to analyze consonants, vowels, tone and more:

After analyzing the phonetic sounds of a language, another important step is to research how those sounds relate to each other phonologically.

Learn how to analyze phonology with Phonology Assistant:

Data can be imported from Fieldworks <link>, or in any of the following formats:

  • Toolbox (.db)
  • Interlinear Toolbox (.itx)
  • Phonology Assistant XML (.paxml)
  • Audio (.wav, .mp3 or .wma)

Note that if you import plain audio files, Phonology Assistant requires that you specify a mapping for the phonetic field.

What opportunities exist for advanced technology:

Linguistic analysis and orthography development typically take many years of collaboration and research between expert linguists and stakeholders within the language community. With the help of advanced technologies, it could be possible to take speech data and greatly speed up the analysis steps, providing the speech community with clear and concise information on which to base their language development decisions.