Corpora: sgml detagger
Danko Sipka
sipkadan at main.amu.edu.pl
Tue Apr 16 18:31:35 UTC 2002
Hi:
This Perl script should do the job:
print "What is your input file name:\n";
chomp($infile=<STDIN>);
open IN, $infile or die "No file, no fun!";
open OUT, ">$infile.out" or die "No file, no fun!";
while (<IN>) {
$_=~s/\<.+?\>//g;
print OUT "$_";
}
close (IN) or die "D'oh!";
close (OUT) or die "D'oh!";
Best,
Danko Sipka
sipkadan at main.amu.edu.pl | Danko.Sipka at asu.edu
http://main.amu.edu.pl/~sipkadan | http://www.public.asu.edu/~dsipka
----- Original Message -----
From: Tine & Colleen
To: CORPORA at HD.UIB.NO
Sent: Tuesday, April 16, 2002 8:13 PM
Subject: Corpora: sgml detagger
Hi
I am compiling a corpus for research reasons and some of the texts are sgml-tagged.
Does anybody know an easy way to remove the tags and save the texts as 'raw' .txt files?
Maybe a PERL script?
Thanks in advance
Tine Lassen
Copenhagen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20020416/3b55b79b/attachment.htm>
More information about the Corpora
mailing list