CLAN: Text Extraction

Xiaowei Zhao xiaoweizhao at gmail.com
Sat Aug 10 03:28:31 UTC 2024


Hello,

First of all, Sorry to pick up this conversation for so long ago!

I am also trying to use the "flo" command to extract "clean" text from .cha
files, and it works very well except one small thing -- it seems to
automatically add line wraps to break long lines exceeding a certain length
to several lines.

For example, for a file (060002c.cha) in the MacWhinney database, I run
flo +cr +t* 060002c.cha

and for a long line in the original .cha file
"
*MAR: no (.) it's not Mr Munsters (.) it's only the Munsters (.) what if
the monsters won't be on anymore and xxx will be with other movie (.) what
if it's at with the other program .
"

I got three lines
"
no it's not Mr Munsters it's only the Munsters what if the monsters won't
be on anymore and will be with other movie what if it's at with the other
program.
"
I am just wondering if there is any command/option/switch within Clan to
avoid this and still keep them on the same line? I tried  "LONGTIER", but
it did not work.

Many thanks!

Sincerely,
Xiaowei

*Xiaowei Zhao, Ph.D.*

Professor of Psychology


*Emmanuel College*

400 The Fenway | Boston | MA 02115

www.emmanuel.edu

On Tue, Feb 6, 2024 at 4:39 PM Leonid Spektor <spektor at andrew.cmu.edu>
wrote:

> Command flo +ca +t* *.cha should work.
>
>
> Leonid.
>
> On Feb 6, 2024, at 16:14, Snigdha Khanna <snkhanna at iu.edu> wrote:
>
> I want to remove all annotations like the gestures and errors. Hence, I
> would like to use the txt format of just the transcribed text without
> annotations.
>
> Any idea how to do that?
>
>
> On Tuesday, February 6, 2024 at 4:10:32 PM UTC-5 macw wrote:
>
>> CLAN’s FLO program does most of this. Alternatively, you could grab all
>> the <w> tags from the XML version of the database.
>>
>> What kind of NLP do you want to use? You could apply Universal
>> Dependencies directly.
>>
>> — Brian MacWhinney
>> Teresa Heinz Professor of Cognitive Psychology,
>> Language Technologies and Modern Languages, CMU
>>
>> > On Feb 6, 2024, at 3:08 PM, Snigdha Khanna <snkh... at iu.edu> wrote:
>> >
>> > Hello!
>> >
>> > I am trying to extract "clean" text from annotated transcripts that I
>> have. Is there any way to use CLAN to export a txt file format, or a
>> simpler method to remove annotations from the transcripts, so that I can
>> parse it using NLP?
>> >
>> > Any help is appreciated!
>> >
>> > Thanks,
>> > Snigdha
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "chibolts" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to chibolts+u... at googlegroups.com.
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/chibolts/237e8996-63ba-4476-859f-4b1e6841ab3an%40googlegroups.com.
>>
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/cb3c67ac-e21e-492a-8710-3f1ef74cda6dn%40googlegroups.com
> <https://groups.google.com/d/msgid/chibolts/cb3c67ac-e21e-492a-8710-3f1ef74cda6dn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/chibolts/7256CB6D-33FE-461B-9A0E-F479DDCC69C7%40andrew.cmu.edu
> <https://groups.google.com/d/msgid/chibolts/7256CB6D-33FE-461B-9A0E-F479DDCC69C7%40andrew.cmu.edu?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/CANVosvX1Q%2BjGDL0WxZKTr2CjtAZeUAPn7%2Bz6gb6X061c%3Du_4-A%40mail.gmail.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20240809/7bf601b8/attachment.htm>


More information about the Chibolts mailing list