[Corpora-List] WordSmith and ANC
Nancy Ide
ide at cs.vassar.edu
Thu Aug 24 01:51:56 UTC 2006
On Jul 20, 2006, at 2:06 PM, Linda Bawcom wrote:
> 1) Can the ANC be used with Wordsmith? Only a program called "Gate"
> is listed on the web site and I don't understand enough about XML,
> filenames, or markups to know if the information given means it
> can be used with Wordsmith. The ANC publisher has not gotten back
> to me (just so you know I've done my home!).
Sorry that this response is so long in coming. I assume by the "ANC
publisher" you mean LDC, which would not have the answer to this
question. Please send inquiries to anc at cs.vassar.edu, not to LDC.
The answer is "yes" concerning WordSmith. Once you have the ANC,
download the ANCTool (go to http://americannationalcorpus.org/tools/
anctool.html--the link to it is on the main ANC web page) and run it,
at which point you can choose the parts of the corpus you want to use
as well as the output format. One of the options in the ANCTool is
for the data to be output in a format for input to WordSmith (see
"The WordSmith tab" on the tool web page). Other options in the
downloadable version of the tool are MonoConc input format and XCES.
We have a new version of the tool which provides other formats as
well as a mechanism for specifying your own output format, is "schema
aware" to enable greater control over the output, and provides
several options for handling overlapping hierarchies. This new
version, together with another 20 million words of data, annotations
for several syntactic analyses, and some manually produced WordNet
annotations will be made available if and when the ANC project finds
the funding to enable us to resume activity.
=======================================================
Nancy Ide
Professor and Chair
Department of Computer Science
Vassar College
Poughkeepsie, New York 12604-0520
USA
tel: (+1 845) 437 5988
fax: (+1 845) 437 7498
email: ide at cs.vassar.edu
http://www.cs.vassar.edu/~ide
=======================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060823/68d4523b/attachment.htm>
More information about the Corpora
mailing list