CLAN: Text Extraction

Brian Macwhinney macw at cmu.edu
Tue Feb 6 21:10:19 UTC 2024


CLAN’s FLO program does most of this.  Alternatively, you could grab all the <w> tags from the XML version of the database.

What kind of NLP do you want to use?  You could apply Universal Dependencies directly.

— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology, 
Language Technologies and Modern Languages, CMU 

> On Feb 6, 2024, at 3:08 PM, Snigdha Khanna <snkhanna at iu.edu> wrote:
> 
> Hello!
> 
> I am trying to extract "clean" text from annotated transcripts that I have. Is there any way to use CLAN to export a txt file format, or a simpler method to remove annotations from the transcripts, so that I can parse it using NLP?
> 
> Any help is appreciated!
> 
> Thanks,
> Snigdha
> 
> -- 
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/237e8996-63ba-4476-859f-4b1e6841ab3an%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/D36A2735-C125-4C2E-B37C-626A1516D524%40cmu.edu.


More information about the Chibolts mailing list