GEM with all main and dependent tiers + Bangla transcripts
Wei, Ran
ran_wei at g.harvard.edu
Wed Mar 1 16:35:51 UTC 2023
Thank you so much Leonid!
The German case was an analogy and obviously not a very helpful one :) I
don't speak Bangla, so I asked my RAs to describe the issue in more detail:
In Bangla, certain combinations of letters combine to form consonant
clusters. For anyone who speaks German, think of the ß. It is different
from ss, but the meaning gets across. Just be prepared for a stink eye from
any Germans. Similarly in Bangla the meaning gets across but the same ß
breaks up into s+s.Below is a list of some of the most common consonant
clusters in Bengali. Clusters of two consonants are the most common, but it
is possible to also have three. They are presented in Bengali alphabetical
order. In clan all of the combined clusters break up into their individual
components.ক + ট = ক্ট
ক + ত = ক্ত
ক + র = ক্র
ক + ল = ক্ল
ক + ষ = ক্ষ
ক + স = ক্স
গ + ধ = গ্ধ
গ + ন = গ্ন
গ + ল = গ্ল
ঙ + ক = ঙ্ক
ঙ + গ = ঙ্গ
চ + চ = চ্চ
চ + ছ = চ্ছ
জ + জ = জ্জ
জ + ঞ = জ্ঞ
জ + ব = জ্ব
ঞ + চ = ঞ্চ
ঞ + ছ = ঞ্ছ
ঞ + জ = ঞ্জ
ঞ + ঝ = ঞ্ঝ
ট + ট = ট্ট
ড + ড = ড্ড
ণ + ট = ণ্ট
ণ + ঠ = ণ্ঠ
ণ + ড = ণ্ড
ত + ত = ত্ত
ত + থ = ত্থ
ত + ব = ত্ব
ত + ম = ত্ম
ত + র = ত্র
দ + দ = দ্দ
দ + ধ = দ্ধ
দ + ব = দ্ব
দ + ম = দ্ম
ধ + ব = ধ্ব
ন + ত = ন্ত
ন + থ = ন্থ
ন + দ = ন্দ
ন + ধ = ন্ধ
ন + ন = ন্ন
ন + ম = ন্ম
প + ত = প্ত
প + প = প্প
প + ল = প্ল
ফ + ল = ফ্ল
ব + জ = ব্জ
ব + দ = ব্দ
ব + ধ = ব্ধ
ব + ব = ব্ব
ব + ল = ব্ল
ভ + র = ভ্র
ম + প = ম্প
ম + ব = ম্ব
ম + ভ = ম্ভ
ম + ম = ম্ম
ম + ল = ম্ল
ল + ক = ল্ক
ল + প = ল্প
ল + ল = ল্ল
শ + চ = শ্চ
শ + ব = শ্ব
ষ + ক = ষ্ক
ষ + ট = ষ্ট
ষ + ঠ = ষ্ঠ
ষ + প = ষ্প
স + ক = স্ক
স + ট = স্ট
স + ত = স্ত
স + থ = স্থ
স + ন = স্ন
স + প = স্প
স + ব = স্ব
হ + ণ = হ্ণ
হ + ম = হ্ম
জ + জ + ব = জ্জ্ব
ত + ত + ব = ত্ত্ব
ন + ত + ব = ন্ত্ব
স + থ + য = স্থ্য
ন + ত + র = ন্ত্র
ন + দ + র = ন্দ্র
স + ত + র = স্ত্র
Interestingly, clan always breaks up these consonant clusters into their
correct individual components, so it seems like there’s an understanding in
the software that these clusters exist and what they’re made up of. It just
fails to understand that it should keep the consonant together instead of
breaking it apart.
3 words as examples that I have probably encountered in every Bangla
transcription so far and these came up multiple times in each. The 3 words
with their English meaning are as follows:
[image: Screen Shot 2023-02-28 at 2.28.00 AM.png]
Thank you so much for looking into this, and let me know if you need
further information!
Thanks,
Ran
On Mon, Feb 27, 2023 at 9:32 AM Leonid Spektor <spektor at andrew.cmu.edu>
wrote:
> Hi Ran,
>
> 1. To get all speakers and all dependent tiers from gem use this command:
>
> *gem +stoy +g +d1 +f +t%: @ *
>
>
> 2. For testing I have selected German ABC-QWERTZ keyboard. When entering ß
> character in CLAN I do not see it splitting. All I get is letter ß. I am
> using Mac and the latest version of CLAN. I would say that CLAN does not
> change character codes that it gets from the system. The splitting must
> occur in the OS system. What computer and OS are you using?
>
>
> Leonid.
>
>
> On Feb 27, 2023, at 08:41, ran... at g.harvard.edu <ran_wei at g.harvard.edu>
> wrote:
>
> Hello,
>
> Happy Monday! Two questions:
>
> 1. I'm using GEM to segment my transcripts (with different episodes marked
> using @Bg: and @Eg:). I've been typing in the main and dependent tiers to
> keep in the GEM-generated transcripts: *gem +stoy +g +d1 +f +t*MOT
> +t%act: @ *
> It works, but I am working with transcripts with many dependent tiers and
> I worry I might miss a few. I wonder if there's a way to include all main
> tiers and dependent tiers, basically keeping the transcript intact and only
> segment it into different GEM episodes.
>
> 2. My research assistants have been transcribing Bangla in CLAN and they
> realize the letters get spit up as they type. Similar to when German β
> gets turned into ss, you can still understand the word meaning but the
> spelling seems off. We tried typing up the words in a word doc and copy and
> paste them into CLAN, but the letters still get automatically split. Is
> there a way we could fix this? It's ok if not - this issue does not affect
> VOCD anyways, so we can work with what we have.
>
> Thanks and have a great week,
> Ran
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/81a58cfd-3c6c-4dcf-98e4-d721fb707348n%40googlegroups.com
> <https://groups.google.com/d/msgid/chibolts/81a58cfd-3c6c-4dcf-98e4-d721fb707348n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "chibolts" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/chibolts/tVzZqWT-Ylk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/3595DE26-8D15-42C5-BC27-2368B0DEEB84%40andrew.cmu.edu
> <https://groups.google.com/d/msgid/chibolts/3595DE26-8D15-42C5-BC27-2368B0DEEB84%40andrew.cmu.edu?utm_medium=email&utm_source=footer>
> .
>
--
--
Ran Wei, Ph.D.
Postdoctoral Research Fellow
Laboratories of Cognitive Neuroscience
Boston Children's Hospital, Harvard Medical School
ran.wei at childrens.harvard.edu
www.ranweiresearch.com
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CALyNGb6Nm4LRbZjTkBdSLBhi_zNyXv1sVUjJbAuocpAGzs_HhA%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20230301/cd6a7ab6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2023-02-28 at 2.28.00 AM.png
Type: image/png
Size: 54026 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20230301/cd6a7ab6/attachment-0001.png>
More information about the Chibolts
mailing list