GEM with all main and dependent tiers + Bangla transcripts

Leonid Spektor spektor at andrew.cmu.edu
Wed Mar 1 18:16:13 UTC 2023


Ran,

	Are you using Mac or PC, which operating system? Have you tried typing the same characters in MS Word? Do they look correct in other editors? I have to be able to replicate the problem for offer a solution to you.


Leonid.


> On Mar 1, 2023, at 11:35, Wei, Ran <ran_wei at g.harvard.edu> wrote:
> 
> Thank you so much Leonid! 
> 
> The German case was an analogy and obviously not a very helpful one :) I don't speak Bangla, so I asked my RAs to describe the issue in more detail:
> 
> In Bangla, certain combinations of letters combine to form consonant clusters. For anyone who speaks German, think of the ß. It is different from ss, but the meaning gets across. Just be prepared for a stink eye from any Germans. Similarly in Bangla the meaning gets across but the same ß breaks up into s+s.
> Below is a list of some of the most common consonant clusters in Bengali. Clusters of two consonants are the most common, but it is possible to also have three. They are presented in Bengali alphabetical order. In clan all of the combined clusters break up into their individual components.
> ক + ট = ক্ট
> ক + ত = ক্ত
> ক + র = ক্র
> ক + ল = ক্ল
> ক + ষ = ক্ষ
> ক + স = ক্স
> গ + ধ = গ্ধ
> গ + ন = গ্ন
> গ + ল = গ্ল
> ঙ + ক = ঙ্ক
> ঙ + গ = ঙ্গ
> চ + চ = চ্চ
> চ + ছ = চ্ছ
> জ + জ = জ্জ
> জ + ঞ = জ্ঞ
> জ + ব = জ্ব
> ঞ + চ = ঞ্চ
> ঞ + ছ = ঞ্ছ
> ঞ + জ = ঞ্জ
> ঞ + ঝ = ঞ্ঝ
> ট + ট = ট্ট
> ড + ড = ড্ড
> ণ + ট = ণ্ট
> ণ + ঠ = ণ্ঠ
> ণ + ড = ণ্ড
> ত + ত = ত্ত
> ত + থ = ত্থ
> ত + ব = ত্ব
> ত + ম = ত্ম
> ত + র = ত্র
> দ + দ = দ্দ
> দ + ধ = দ্ধ
> দ + ব = দ্ব
> দ + ম = দ্ম
> ধ + ব = ধ্ব
> ন + ত = ন্ত
> ন + থ = ন্থ
> ন + দ = ন্দ
> ন + ধ = ন্ধ
> ন + ন = ন্ন
> ন + ম = ন্ম
> প + ত = প্ত
> প + প = প্প
> প + ল = প্ল
> ফ + ল = ফ্ল
> ব + জ = ব্জ
> ব + দ = ব্দ
> ব + ধ = ব্ধ
> ব + ব = ব্ব
> ব + ল = ব্ল
> ভ + র = ভ্র
> ম + প = ম্প
> ম + ব = ম্ব
> ম + ভ = ম্ভ
> ম + ম = ম্ম
> ম + ল = ম্ল
> ল + ক = ল্ক
> ল + প = ল্প
> ল + ল = ল্ল
> শ + চ = শ্চ
> শ + ব = শ্ব
> ষ + ক = ষ্ক
> ষ + ট = ষ্ট
> ষ + ঠ = ষ্ঠ
> ষ + প = ষ্প
> স + ক = স্ক
> স + ট = স্ট
> স + ত = স্ত
> স + থ = স্থ
> স + ন = স্ন
> স + প = স্প
> স + ব = স্ব
> হ + ণ = হ্ণ
> হ + ম = হ্ম
> জ + জ + ব = জ্জ্ব
> ত + ত + ব = ত্ত্ব
> ন + ত + ব = ন্ত্ব
> স + থ + য = স্থ্য
> ন + ত + র = ন্ত্র
> ন + দ + র = ন্দ্র
> স + ত + র = স্ত্র
> 
> Interestingly, clan always breaks up these consonant clusters into their correct individual components, so it seems like there’s an understanding in the software that these clusters exist and what they’re made up of. It just fails to understand that it should keep the consonant together instead of breaking it apart. 
> 
> 3 words as examples that I have probably encountered in every Bangla transcription so far and these came up multiple times in each. The 3 words with their English meaning are as follows:
> 
> 
> Thank you so much for looking into this, and let me know if you need further information!
> 
> Thanks,
> Ran
> 
> On Mon, Feb 27, 2023 at 9:32 AM Leonid Spektor <spektor at andrew.cmu.edu <mailto:spektor at andrew.cmu.edu>> wrote:
> Hi Ran,
> 
> 1. To get all speakers and all dependent tiers from gem use this command:
> 
> gem +stoy +g +d1 +f +t%: @ 
> 
> 
> 2. For testing I have selected German ABC-QWERTZ keyboard. When entering ß character in CLAN I do not see it splitting. All I get is letter ß. I am using Mac and the latest version of CLAN. I would say that CLAN does not change character codes that it gets from the system. The splitting must occur in the OS system. What computer and OS are you using?
> 
> 
> Leonid.
> 
> 
>> On Feb 27, 2023, at 08:41, ran... at g.harvard.edu <http://g.harvard.edu/> <ran_wei at g.harvard.edu <mailto:ran_wei at g.harvard.edu>> wrote:
>> 
>> Hello,
>> 
>> Happy Monday! Two questions:
>> 
>> 1. I'm using GEM to segment my transcripts (with different episodes marked using @Bg: and @Eg:). I've been typing in the main and dependent tiers to keep in the GEM-generated transcripts: gem +stoy +g +d1 +f +t*MOT +t%act: @ 
>> It works, but I am working with transcripts with many dependent tiers and I worry I might miss a few. I wonder if there's a way to include all main tiers and dependent tiers, basically keeping the transcript intact and only segment it into different GEM episodes.
>> 
>> 2. My research assistants have been transcribing Bangla in CLAN and they realize the letters get spit up as they type. Similar to when German β gets turned into ss, you can still understand the word meaning but the spelling seems off. We tried typing up the words in a word doc and copy and paste them into CLAN, but the letters still get automatically split. Is there a way we could fix this? It's ok if not - this issue does not affect VOCD anyways, so we can work with what we have. 
>> 
>> Thanks and have a great week,
>> Ran
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "chibolts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/81a58cfd-3c6c-4dcf-98e4-d721fb707348n%40googlegroups.com <https://groups.google.com/d/msgid/chibolts/81a58cfd-3c6c-4dcf-98e4-d721fb707348n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google Groups "chibolts" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/chibolts/tVzZqWT-Ylk/unsubscribe <https://groups.google.com/d/topic/chibolts/tVzZqWT-Ylk/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/3595DE26-8D15-42C5-BC27-2368B0DEEB84%40andrew.cmu.edu <https://groups.google.com/d/msgid/chibolts/3595DE26-8D15-42C5-BC27-2368B0DEEB84%40andrew.cmu.edu?utm_medium=email&utm_source=footer>.
> 
> 
> -- 
> --
> 
> Ran Wei, Ph.D. 
> Postdoctoral Research Fellow
> Laboratories of Cognitive Neuroscience
> Boston Children's Hospital, Harvard Medical School
> ran.wei at childrens.harvard.edu <mailto:ran.wei at childrens.harvard.edu>
> www.ranweiresearch.com <http://www.ranweiresearch.com/>
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com <mailto:chibolts+unsubscribe at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CALyNGb6Nm4LRbZjTkBdSLBhi_zNyXv1sVUjJbAuocpAGzs_HhA%40mail.gmail.com <https://groups.google.com/d/msgid/chibolts/CALyNGb6Nm4LRbZjTkBdSLBhi_zNyXv1sVUjJbAuocpAGzs_HhA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/90EA9129-62EA-4C45-AAC5-619548E03404%40andrew.cmu.edu.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20230301/c2fd602b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2023-02-28 at 2.28.00 AM.png
Type: image/png
Size: 54026 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20230301/c2fd602b/attachment-0001.png>


More information about the Chibolts mailing list