Corpora: sgml detagger
William H. Fletcher
fletcher at usna.edu
Wed Apr 17 12:56:47 UTC 2002
I have posted an SGML / HTML tag stripper for Windows at
http://kwicfinder.com/StripTags.zip . It removes everything between pairs of
< > , so it can fail in those rare cases in which a > is embedded within a
comment or an attribute. It also does not translate HTML entities (e.g.
é --> é); I'll be glad to add that feature and / or support for
command line operation with wildcards if someone requests. Tine reports
this program "seems to do the trick".
Regards,
Bill Fletcher
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
William H. Fletcher 410.293.6362 [voice]
Associate Professor, German & Spanish 410.293.2729 [fax]
Language Studies Department
US Naval Academy
589 McNair Road
Annapolis, MD 21402 - 5030
fletcher at usna.edu
http://www.usna.edu/LangStudy/
http://kwicfinder.com/
http://miniappolis.com/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
More information about the Corpora
mailing list