chat to text conversion (accents in Spanish)

Elnaz Kia ek325 at nau.edu
Thu Oct 28 19:58:49 UTC 2021


Dear Brian,

oh, got it! I have 2 questions though:

1. How did you convert .cha to .pdf? This is the message that I get when I
try to do the same.
2. Also, the txt file that I sent to you looked fine in terms of showing
accents. Why do you think it should not be working when we convert it to
pdf?

[image: image.png]

Many thanks!
Elnaz


Elnaz Kia, Ph.D. (she, her, hers)

*Post-Doctoral Research Associate
<https://l2trec.utah.edu/about/staff-directory.php>*

Second Language Teaching and Research Center (L2TReC)
<https://l2trec.utah.edu/>

University of Utah

LinkedIn
<https://www.linkedin.com/in/elnaz-kia?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3B%2FResGVF2SMGMAxhocdPnRw%3D%3D>

Personal Website <https://elnazkia.weebly.com/>



On Thu, Oct 28, 2021 at 12:44 PM Brian Macwhinney <macw at andrew.cmu.edu>
wrote:

> Dear Elnaz,
>      I meant that I was able to make a fine-looking PDF from the original
> CHAT file.  If you make it from the TXT file that you created, it will
> indeed have the problem you described.
>
> — Brian MacWhinney
> Teresa Heinz Professor of Cognitive Psychology,
> Computational Linguistics,
> and Modern Languages, CMU
>
> On Oct 28, 2021, at 3:30 PM, Elnaz Kia <ek325 at nau.edu> wrote:
>
> Hi Leonid,
>
> I just forwarded your message to them. Now that you mentioned that I think
> I might have had something to do with that. Because another step that I
> took after creating the text files was to remove the @UTF8 and @Window
> lines at the beginning of the text files. :-(
>
> Referring to what Brian said about the pdf tool working for him, I just
> tried that again with the same txt file with the @UTF8 line intact and
> still got the incorrect results.
>
> Best,
> Elnaz
>
> Elnaz Kia, Ph.D. (she, her, hers)
>
> *Post-Doctoral Research Associate
> <https://l2trec.utah.edu/about/staff-directory.php>*
>
> Second Language Teaching and Research Center (L2TReC)
> <https://l2trec.utah.edu/>
>
> University of Utah
>
> LinkedIn
> <https://www.linkedin.com/in/elnaz-kia?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3B%2FResGVF2SMGMAxhocdPnRw%3D%3D>
>
> Personal Website <https://elnazkia.weebly.com/>
>
>
>
> On Thu, Oct 28, 2021 at 12:20 PM Leonid Spektor <spektor at andrew.cmu.edu>
> wrote:
>
>> Elnaz,
>>
>> I am not familiar with Adobe Acrobat DC, so I will defer to Brian's email.
>>
>> The problem with the university upload of the txt files to the university
>> database is something that people who support this process need to clarify.
>> It is possible that they expect some BOM character at the beginning of the
>> txt file to explicitly indicate the text encoding. It would be best if you
>> tell them that those text files are UTF-8 text encoding and let them say
>> what they believe is missing in those files for them to get the upload
>> right. Normally newer text editors can automatically detect the text file
>> encoding and adjust their display accordingly.
>>
>>
>> Leonid.
>>
>> On Oct 28, 2021, at 15:08, Brian Macwhinney <macw at andrew.cmu.edu> wrote:
>>
>> When I convert files to PDF using Adobe Acrobat DC, all the characters
>> look fine.
>> I just did this for one file.  Perhaps batch doesn’t work well?
>>
>> Why are your computer people uploading in text format? They should just
>> be uploading in CHAT format.
>>
>> — Brian MacWhinney
>> Teresa Heinz Professor of Cognitive Psychology,
>> Computational Linguistics,
>> and Modern Languages, CMU
>>
>> On Oct 28, 2021, at 3:03 PM, Elnaz Kia <ek325 at nau.edu> wrote:
>>
>> Dear Leonid and Brian,
>>
>> Thank you both so much for your detailed responses.
>>
>> @Leonid Spektor <spektor at andrew.cmu.edu> you are right. I just
>> realized that the txt files that I created using the CHSTRING and REN
>> commands on my computer were correct. However, when the database crew at
>> the university upload the txt files to the university database, the txt
>> files do not show the correct characters. Do you have any solutions for
>> this?
>>
>> Another issue is with the pdf versions of the mentioned files. Even on my
>> computer when I convert the correctly converted txt files to pdf, it does
>> not show the characters correctly. Are there any solutions for this
>> problem? Note: I create the pdf files in batch using the Adobe Acrobat DC
>> Create PDFs Tool.
>>
>> Many thanks for taking the time and answering my questions!
>>
>> Best,
>> Elnaz
>>
>> Also,
>>
>> Also
>>
>> Elnaz Kia, Ph.D. (she, her, hers)
>>
>> *Post-Doctoral Research Associate
>> <https://l2trec.utah.edu/about/staff-directory.php>*
>>
>> Second Language Teaching and Research Center (L2TReC)
>> <https://l2trec.utah.edu/>
>>
>> University of Utah
>>
>> LinkedIn
>> <https://www.linkedin.com/in/elnaz-kia?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3B%2FResGVF2SMGMAxhocdPnRw%3D%3D>
>>
>> Personal Website <https://elnazkia.weebly.com/>
>>
>>
>>
>> On Thu, Oct 28, 2021 at 11:50 AM Leonid Spektor <spektor at andrew.cmu.edu>
>> wrote:
>>
>>> Elnaz,
>>>
>>> I just want to add more specific information to what Brian wrote. Your
>>> text editor needs to be able to display Unicode UTF-8 encoded characters.
>>> If you open the .txt file with CLAN, then you will see that characters from
>>> .txt file are displayed correctly.
>>>
>>> If characters in your .txt file are not displayed correctly in CLAN,
>>> then please make sure that you have the latest version of CLAN. Otherwise,
>>> please email your sample file that show this problem to me directly for
>>> further testing.
>>>
>>> Copying the line from .cha transcript that you have in your email below
>>> to a test file on my computer and then running the CHSTRING and REN command
>>> produces correct result on my computer.
>>>
>>>
>>> Leonid.
>>>
>>> On Oct 28, 2021, at 14:06, Brian Macwhinney <macw at andrew.cmu.edu> wrote:
>>>
>>> Dear Elnaz,
>>>      In order to view non-Roman characters such as é, as well as
>>> diacritics, CLAN relies on use of the Arial Unicode font which supports not
>>> only special European characters, but also Chinese, Sinhalese etc. because
>>> it is all of Unicode.  If you convert a CHAT file viewed in Unicode to .txt
>>> format, you are going to see what you are seeing now unless your editor for
>>> the .txt format allows you to load in a Unicode font.
>>>
>>> — Brian MacWhinney
>>> Teresa Heinz Professor of Cognitive Psychology,
>>> Computational Linguistics,
>>> and Modern Languages, CMU
>>>
>>> On Oct 28, 2021, at 1:27 PM, Elnaz Kia <ek325 at nau.edu> wrote:
>>>
>>> Hi Everyone,
>>>
>>> I have a question about accents in Spanish. So, here is what I have on a
>>> .cha transcript file:
>>>
>>> *STU: yo [:: _] me gusta ver(lo) *él* [:: _] porque él es muy bien
>>> [:: bueno] en el xxx eh porque es el goleador .
>>>
>>> And when I convert the file to a text. this happens:
>>>
>>> *STU:	yo [:: _] me gusta ver(lo)* él* [:: _] porque él es muy bien
>>>
>>> [:: bueno] en el xxx eh porque es el goleador .
>>>
>>> And this is how I convert .cha files to .txt files:
>>>
>>> chstring +re +cbullets.cut *.cha
>>> ren -f +re *.chstr.cex *.txt
>>>
>>> My question is, how can I avoid this problem?
>>>
>>> Thanks,
>>> Elnaz
>>>
>>>
>>>
>>>
>>> Elnaz Kia, Ph.D. (she, her, hers)
>>>
>>> *Post-Doctoral Research Associate
>>> <https://l2trec.utah.edu/about/staff-directory.php>*
>>>
>>> Second Language Teaching and Research Center (L2TReC)
>>> <https://l2trec.utah.edu/>
>>>
>>> University of Utah
>>>
>>> LinkedIn
>>> <https://www.linkedin.com/in/elnaz-kia?lipi=urn%3Ali%3Apage%3Ad_flagship3_profile_view_base_contact_details%3B%2FResGVF2SMGMAxhocdPnRw%3D%3D>
>>>
>>> Personal Website <https://elnazkia.weebly.com/>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "chibolts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to chibolts+unsubscribe at googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/chibolts/CAOwOJYkD%3DarGj%2B7sDYxHzyZikHs_Jd3wHJmCALbCfJCWe69T1A%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/chibolts/CAOwOJYkD%3DarGj%2B7sDYxHzyZikHs_Jd3wHJmCALbCfJCWe69T1A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "chibolts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to chibolts+unsubscribe at googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/chibolts/EE73C6A2-849E-4FED-906C-7ED78BAD803E%40andrew.cmu.edu
>>> <https://groups.google.com/d/msgid/chibolts/EE73C6A2-849E-4FED-906C-7ED78BAD803E%40andrew.cmu.edu?utm_medium=email&utm_source=footer>
>>> .
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "chibolts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to chibolts+unsubscribe at googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/chibolts/CAOwOJYnvJvV%3DDbDHN8if%3Dikg-4YOyi_e0p5Bj9VkAMONvKVXeg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/chibolts/CAOwOJYnvJvV%3DDbDHN8if%3Dikg-4YOyi_e0p5Bj9VkAMONvKVXeg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CAOwOJYk%3DKRFt2f1eBONSG02Q8LXrZv9ufEbx1EV%3DjbuJkPuvPQ%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20211028/a5d96712/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 22847 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20211028/a5d96712/attachment-0002.png>


More information about the Chibolts mailing list