Corpora: sgml detagger

William H. Fletcher fletcher at usna.edu
Wed Apr 17 12:56:47 UTC 2002


I have posted an SGML / HTML tag stripper for Windows at
http://kwicfinder.com/StripTags.zip . It removes everything between pairs of
< > , so it can fail in those rare cases in which a  > is embedded within a
comment or an attribute.  It also does not translate HTML entities (e.g.
é --> é); I'll be glad to add that feature and / or support for
command line operation with wildcards if someone requests.  Tine reports
this program "seems to do the trick".

Regards,
Bill Fletcher


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  William H. Fletcher              410.293.6362 [voice]
  Associate Professor, German & Spanish   410.293.2729 [fax]
  Language Studies Department
  US Naval Academy
  589 McNair Road
  Annapolis, MD 21402 - 5030

  fletcher at usna.edu
  http://www.usna.edu/LangStudy/
  http://kwicfinder.com/
  http://miniappolis.com/

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



More information about the Corpora mailing list