[Lingtyp] Praat script for automatically segmenting vowels?

Cat Butz Cat.Butz at hhu.de
Thu Apr 6 12:48:39 UTC 2023


Dear Ian, dear everyone,

Thank you again for your time. First off, some clarifications: The 
person who wrote the Daakaka grammar is a woman, and her name is Kilu. 
She is also the one who collected the Dalkalaen data that I'm currently 
working with.

In the meantime, I have actually been privately suggested a script that 
does exactly what I was looking for: 
https://github.com/parantes/vowel-detector
So far it hasn't been working very well, and I'll spend some time trying 
to improve its performance. In any case, it seems that if I want to do a 
statistical analysis, I won't be able to avoid lots of manual data 
processing, which has me questioning again whether this in-depth 
approach is worth it at the moment or whether it would make more sense 
to focus more on other tasks at hand and eventually come back to it 
later.

The question of a phonemic /e ε/ and /o ɔ/ contrast (which is just one 
of many questions) in Dalkalaen arises specifically because, while most 
of the time vowels vary all over the place, there are some lexemes which 
are comparatively consistently articulated with [e] and [o], 
respectively. I'm not sure about the details of the situation in 
Daakaka, though I think it's similar. Thinking about it, it might 
actually be worth it to have a look that.

Also thank you, Ian, for clearing up the time point vs. segment 
question.

Best,
---
Cat Butz (she/they)
HHU Düsseldorf
General Linguistics


Am 2023-04-06 02:42, schrieb Ian Maddieson:
> Hi Cat,
> 
> I would agree with the caution suggested by Volker. I doubt that you
> will get a solution to questions like
> how many contrastive vowels there by any kind of clustering approach,
> particularly if, as you say, the
> vowels of Dalkalaen are very dispersed. I think you have to have a
> definite idea of how many contrastive
> vowels there are, then you can test for support for this hypothesis.
> From where it’s spoken a first hypothesis
> might be that there are five vowels, though I note that Kilo von
> Prince in his Daakaka grammar and the dictionary
> suggests there are 7 vowels (plus a length contrast) with the e/ɛ and
> o/ɔ contrast only found after alveolars.
> This strikes me as a very odd distribution, but I’ve not made any
> attempt to check it in the recordings in the
> DOBeS archive and have only seen a snippet of the Grammar. I presume
> you were making transcriptions as
> you collected data in the field —  how many symbols did you use in
> these; which words were you confident
> in transcribing; what characteristics did these words have? Asking
> questions like these might help get to an
> analysis of vowel system better than a lot of acoustic measurements
> 
> My comment about not needing complete segmentation was simply to point
> out that to extract formant measurements
> at a given time point all you need is the location of that time point,
> not the boundaries of the segment in which it is
> contained, or labels for adjoining segments.
> 
> Ian
> 
>> On Apr 3, 2023, at 06:36, Volker Gast <volker.gast at uni-jena.de>
>> wrote:
>> 
>> Hi Cat,
>> I think I know what you mean. I had similar ideas in my work on the
>> Papuan language Idi, but I ended up realizing that computers can
>> only help us understand vowels, not do the job for us.
>> 
>> Clustering doesn't work in my experience because the formant
>> measurements of a single vowel (token) are extremely spread out, and
>> it's very hard (perhaps impossible) to control for all the
>> covariates, at least in an unsupervised manner. (Or perhaps that's
>> doable with huge amounts of data, which I did not have.)
>> 
>> What I found very helpful though was visualizations. I extracted
>> vowel measurements from my data and plotted them, time-aligned with
>> the word they were taken from. The result is a set of little video
>> clips. I attach an example (MP4). The colours of the measurements
>> represent the vowel symbols of my annotations. So this is
>> essentially a way of inspecting my own (manual) analyses.
>> 
>> I did this in 2015, and I don't even remember what exactly I did. I
>> think I used MAUS to align the audio signal with the transcriptions
>> (now I prefer the Montreal Forced Aligner). The data was processed
>> with Praat.
>> 
>> I think if you want to automate vowel segmentation you will
>> (minimally) have to train your own model (which means you need
>> annotated data). If you use a pre-trained model (as I in fact did at
>> the time) you may introduce a bias that represents the training
>> data, rather than the target language.
>> 
>> I'd be interested in learning about similar (and better) ways of
>> visualizing vowels (and data-driven approaches to vowel phonology
>> more generally).
>> 
>> Best,
>> Volker
>> 
>> On 03/04/2023 11:48, Cat Butz wrote:
>> Hello everyone,
>> 
>> Thank you so much to everyone who has reached out so far. I'll check
>> out everything you've suggested.
>> 
>> Ian: I don't fully understand what you mean by marking the vowel
>> center vs. complete segmentation, so if it's ok, I'll just answer
>> your questions and see what you have to say afterwards. Thank you
>> for taking the time.
>> 
>> I'm working with continuous speech and currently have no fixed
>> number of tokens per vowel that I aim to collect (as many as
>> possible, of course, and what's possible depends on the resources I
>> find). The vowel space of Dalkalaen is extremely broad and diffuse
>> and you will find instances of pretty much every vowel the IPA has a
>> symbol for and then some in a few minutes of natural speech. Because
>> of this, I'm having trouble determining the number of phonemes,
>> which is one of several reasons I want to employ some statistics in
>> coming up with a meaningful description of the behaviour of vowels.
>> So e.g. for starters, I'd like to see whether the clustering in a
>> formant plot of phonetic front vowels will suggest two or three
>> phonemic vowel qualities (or possibly four). Eventually, I want to
>> get to a point where I can describe as many as possible of the
>> factors conditioning phonetic variation of vowels; coarticulation,
>> morphosyntactic context, probability, etc.
>> 
>> So for now, it would save me a lot of work if a script could just
>> separate the vowels from everything else in the sound signal, so
>> that I could then just go through them, label them to several
>> different degrees of detail, and do some statistics and see what
>> happens.
>> 
>> Best,
>> ---
>> Cat Butz (she/they)
>> HHU Düsseldorf
>> General Linguistics
>> 
>> Am 2023-04-01 02:59, schrieb Ian Maddieson:
>> Dear Cat,
>> 
>> Reading your description of what you want to do I’m wondering it
>> if
>> is necessary to use an aligner,
>> as it would seem that a simple marking of the vowel center is all
>> you
>> need (although a script to
>> extract formant values would no doubt be useful). There is no need
>> to
>> do a complete segmentation.
>> Are you working with isolated words or is all your data continuous
>> speech? How many tokens of
>> each vowel do you aim to measure? And how dense is the vowel space
>> in
>> this language?
>> 
>> I’d add that it is also necessary to check that the formant
>> extraction is reliable, which requires
>> a pretty hands-on examination of the data.
>> 
>> Ian
>> 
>> On Mar 31, 2023, at 16:11, Pegi Bakula <pegi at buffalo.edu> wrote:
>> 
>> Hi Cat,
>> 
>> Without knowing the specifics of your situation, there are plenty of
>> options available for speech recognition. Which one to choose will
>> depend on a number of factors, of course. Two programs that quickly
>> come to mind are the Praat plug-in EasyAlign and the Montreal Forced
>> Aligner. Either one of these can autosegment down to the phone
>> level. I've never tried EasyAlign, but I'd guess as a plug-in it
>> would be fairly straightforward. I have used the MFA, and will admit
>> there is some amount of work that goes into using it... at least for
>> what I'm using it for.
>> 
>> I hope that helps, though I'm sure there are plenty of people here
>> who can offer better suggestions.
>> 
>> Cheers,
>> Pegi
>> 
>> On Fri, Mar 31, 2023 at 12:13 PM Cat Butz <Cat.Butz at hhu.de> wrote:
>> 
>> Dear community,
>> 
>> The title should be pretty straightforward: I'm writing a grammar
>> and
>> need to label and plot a bunch of vowel tokens from raw sound
>> files, and
>> I'm trying to find ways of getting there besides segmenting and
>> labelling everything by hand, specifically the segmenting part.
>> I'll be
>> thankful for any advice, no matter how tangential.
>> 
>> Wishing everyone a beautiful weekend.
>> 
>> Best,
>> --
>> Cat Butz (she/they)
>> HHU Düsseldorf
>> General Linguistics
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
> 
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistserv.linguistlist.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Flingtyp&data=05%7C01%7Cpegi%40g-mail.buffalo.edu%7Cfd4faa0d8ab241bfbf0d08db3202c7d9%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638158759776078444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fDETG2rQ63Lb4mfsjf%2FDQJtIjKuJzNliXszfRz%2FQLJ8%3D&reserved=0
> 
> 
>> _______________________________________________
>> Lingtyp mailing list
>> Lingtyp at listserv.linguistlist.org
>> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
> 
> Ian Maddieson
> 
> Department of Linguistics
> University of New Mexico
> MSC03-2130
> Albuquerque NM 87131-0001
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp<bombeag.mp4>_______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/lingtyp
> 
> Ian Maddieson
> 
> Department of Linguistics
> University of New Mexico
> MSC03-2130
> Albuquerque NM 87131-0001


More information about the Lingtyp mailing list