[Lingtyp] Praat script for automatically segmenting vowels?

Mon Apr 3 09:48:52 UTC 2023

Hello everyone,

Thank you so much to everyone who has reached out so far. I'll check out 
everything you've suggested.

Ian: I don't fully understand what you mean by marking the vowel center 
vs. complete segmentation, so if it's ok, I'll just answer your 
questions and see what you have to say afterwards. Thank you for taking 
the time.

I'm working with continuous speech and currently have no fixed number of 
tokens per vowel that I aim to collect (as many as possible, of course, 
and what's possible depends on the resources I find). The vowel space of 
Dalkalaen is extremely broad and diffuse and you will find instances of 
pretty much every vowel the IPA has a symbol for and then some in a few 
minutes of natural speech. Because of this, I'm having trouble 
determining the number of phonemes, which is one of several reasons I 
want to employ some statistics in coming up with a meaningful 
description of the behaviour of vowels. So e.g. for starters, I'd like 
to see whether the clustering in a formant plot of phonetic front vowels 
will suggest two or three phonemic vowel qualities (or possibly four). 
Eventually, I want to get to a point where I can describe as many as 
possible of the factors conditioning phonetic variation of vowels; 
coarticulation, morphosyntactic context, probability, etc.

So for now, it would save me a lot of work if a script could just 
separate the vowels from everything else in the sound signal, so that I 
could then just go through them, label them to several different degrees 
of detail, and do some statistics and see what happens.

Best,
---
Cat Butz (she/they)
HHU Düsseldorf
General Linguistics

Am 2023-04-01 02:59, schrieb Ian Maddieson:
> Dear Cat,
> 
> Reading your description of what you want to do I’m wondering it if
> is necessary to use an aligner,
> as it would seem that a simple marking of the vowel center is all you
> need (although a script to
> extract formant values would no doubt be useful). There is no need to
> do a complete segmentation.
> Are you working with isolated words or is all your data continuous
> speech? How many tokens of
> each vowel do you aim to measure? And how dense is the vowel space in
> this language?
> 
> I’d add that it is also necessary to check that the formant
> extraction is reliable, which requires
> a pretty hands-on examination of the data.
> 
> Ian
> 
>> On Mar 31, 2023, at 16:11, Pegi Bakula <pegi at buffalo.edu> wrote:
>> 
>> Hi Cat,
>> 
>> Without knowing the specifics of your situation, there are plenty of
>> options available for speech recognition. Which one to choose will
>> depend on a number of factors, of course. Two programs that quickly
>> come to mind are the Praat plug-in EasyAlign and the Montreal Forced
>> Aligner. Either one of these can autosegment down to the phone
>> level. I've never tried EasyAlign, but I'd guess as a plug-in it
>> would be fairly straightforward. I have used the MFA, and will admit
>> there is some amount of work that goes into using it... at least for
>> what I'm using it for.
>> 
>> I hope that helps, though I'm sure there are plenty of people here
>> who can offer better suggestions.
>> 
>> Cheers,
>> Pegi
>> 
>> On Fri, Mar 31, 2023 at 12:13 PM Cat Butz <Cat.Butz at hhu.de> wrote:
>> 
>>> Dear community,
>>> 
>>> The title should be pretty straightforward: I'm writing a grammar
>>> and
>>> need to label and plot a bunch of vowel tokens from raw sound
>>> files, and
>>> I'm trying to find ways of getting there besides segmenting and
>>> labelling everything by hand, specifically the segmenting part.
>>> I'll be
>>> thankful for any advice, no matter how tangential.
>>> 
>>> Wishing everyone a beautiful weekend.
>>> 
>>> Best,
>>> --
>>> Cat Butz (she/they)
>>> HHU Düsseldorf
>>> General Linguistics
>>> _______________________________________________
>>> Lingtyp mailing list
>>> Lingtyp at listserv.linguistlist.org
>>> 
>> 
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C01%7Cpegi%40g-mail.buffalo.edu%7Cfd4faa0d8ab241bfbf0d08db3202c7d9%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638158759776078444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fDETG2rQ63Lb4mfsjf%2FDQJtIjKuJzNliXszfRz%2FQLJ8%3D&reserved=0
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
> 
> Ian Maddieson
> 
> Department of Linguistics
> University of New Mexico
> MSC03-2130
> Albuquerque NM 87131-0001