[Lingtyp] Praat script for automatically segmenting vowels?

Volker Gast volker.gast at uni-jena.de
Mon Apr 3 12:36:16 UTC 2023


Hi Cat,
I think I know what you mean. I had similar ideas in my work on the 
Papuan language Idi, but I ended up realizing that computers can only 
help us understand vowels, not do the job for us.

Clustering doesn't work in my experience because the formant 
measurements of a single vowel (token) are extremely spread out, and 
it's very hard (perhaps impossible) to control for all the covariates, 
at least in an unsupervised manner. (Or perhaps that's doable with huge 
amounts of data, which I did not have.)

What I found very helpful though was visualizations. I extracted vowel 
measurements from my data and plotted them, time-aligned with the word 
they were taken from. The result is a set of little video clips. I 
attach an example (MP4). The colours of the measurements represent the 
vowel symbols of my annotations. So this is essentially a way of 
inspecting my own (manual) analyses.

I did this in 2015, and I don't even remember what exactly I did. I 
think I used MAUS to align the audio signal with the transcriptions (now 
I prefer the Montreal Forced Aligner). The data was processed with Praat.

I think if you want to automate vowel segmentation you will (minimally) 
have to train your own model (which means you need annotated data). If 
you use a pre-trained model (as I in fact did at the time) you may 
introduce a bias that represents the training data, rather than the 
target language.

I'd be interested in learning about similar (and better) ways of 
visualizing vowels (and data-driven approaches to vowel phonology more 
generally).

Best,
Volker


On 03/04/2023 11:48, Cat Butz wrote:
> Hello everyone,
>
> Thank you so much to everyone who has reached out so far. I'll check 
> out everything you've suggested.
>
> Ian: I don't fully understand what you mean by marking the vowel 
> center vs. complete segmentation, so if it's ok, I'll just answer your 
> questions and see what you have to say afterwards. Thank you for 
> taking the time.
>
> I'm working with continuous speech and currently have no fixed number 
> of tokens per vowel that I aim to collect (as many as possible, of 
> course, and what's possible depends on the resources I find). The 
> vowel space of Dalkalaen is extremely broad and diffuse and you will 
> find instances of pretty much every vowel the IPA has a symbol for and 
> then some in a few minutes of natural speech. Because of this, I'm 
> having trouble determining the number of phonemes, which is one of 
> several reasons I want to employ some statistics in coming up with a 
> meaningful description of the behaviour of vowels. So e.g. for 
> starters, I'd like to see whether the clustering in a formant plot of 
> phonetic front vowels will suggest two or three phonemic vowel 
> qualities (or possibly four). Eventually, I want to get to a point 
> where I can describe as many as possible of the factors conditioning 
> phonetic variation of vowels; coarticulation, morphosyntactic context, 
> probability, etc.
>
> So for now, it would save me a lot of work if a script could just 
> separate the vowels from everything else in the sound signal, so that 
> I could then just go through them, label them to several different 
> degrees of detail, and do some statistics and see what happens.
>
> Best,
> ---
> Cat Butz (she/they)
> HHU Düsseldorf
> General Linguistics
>
>
> Am 2023-04-01 02:59, schrieb Ian Maddieson:
>> Dear Cat,
>>
>> Reading your description of what you want to do I’m wondering it if
>> is necessary to use an aligner,
>> as it would seem that a simple marking of the vowel center is all you
>> need (although a script to
>> extract formant values would no doubt be useful). There is no need to
>> do a complete segmentation.
>> Are you working with isolated words or is all your data continuous
>> speech? How many tokens of
>> each vowel do you aim to measure? And how dense is the vowel space in
>> this language?
>>
>> I’d add that it is also necessary to check that the formant
>> extraction is reliable, which requires
>> a pretty hands-on examination of the data.
>>
>> Ian
>>
>>> On Mar 31, 2023, at 16:11, Pegi Bakula <pegi at buffalo.edu> wrote:
>>>
>>> Hi Cat,
>>>
>>> Without knowing the specifics of your situation, there are plenty of
>>> options available for speech recognition. Which one to choose will
>>> depend on a number of factors, of course. Two programs that quickly
>>> come to mind are the Praat plug-in EasyAlign and the Montreal Forced
>>> Aligner. Either one of these can autosegment down to the phone
>>> level. I've never tried EasyAlign, but I'd guess as a plug-in it
>>> would be fairly straightforward. I have used the MFA, and will admit
>>> there is some amount of work that goes into using it... at least for
>>> what I'm using it for.
>>>
>>> I hope that helps, though I'm sure there are plenty of people here
>>> who can offer better suggestions.
>>>
>>> Cheers,
>>> Pegi
>>>
>>> On Fri, Mar 31, 2023 at 12:13 PM Cat Butz <Cat.Butz at hhu.de> wrote:
>>>
>>>> Dear community,
>>>>
>>>> The title should be pretty straightforward: I'm writing a grammar
>>>> and
>>>> need to label and plot a bunch of vowel tokens from raw sound
>>>> files, and
>>>> I'm trying to find ways of getting there besides segmenting and
>>>> labelling everything by hand, specifically the segmenting part.
>>>> I'll be
>>>> thankful for any advice, no matter how tangential.
>>>>
>>>> Wishing everyone a beautiful weekend.
>>>>
>>>> Best,
>>>> -- 
>>>> Cat Butz (she/they)
>>>> HHU Düsseldorf
>>>> General Linguistics
>>>> _______________________________________________
>>>> Lingtyp mailing list
>>>> Lingtyp at listserv.linguistlist.org
>>>>
>>>
>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C01%7Cpegi%40g-mail.buffalo.edu%7Cfd4faa0d8ab241bfbf0d08db3202c7d9%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638158759776078444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fDETG2rQ63Lb4mfsjf%2FDQJtIjKuJzNliXszfRz%2FQLJ8%3D&reserved=0 
>>
>>> _______________________________________________
>>> Lingtyp mailing list
>>> Lingtyp at listserv.linguistlist.org
>>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
>>
>> Ian Maddieson
>>
>> Department of Linguistics
>> University of New Mexico
>> MSC03-2130
>> Albuquerque NM 87131-0001
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bombeag.mp4
Type: video/mp4
Size: 133422 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20230403/80538c9e/attachment.mp4>


More information about the Lingtyp mailing list