[Corpora-List] WordSmith and ANC

Nancy Ide ide at cs.vassar.edu
Thu Aug 24 01:51:56 UTC 2006


On Jul 20, 2006, at 2:06 PM, Linda Bawcom wrote:
> 1) Can the ANC be used with Wordsmith? Only a program called "Gate"  
> is listed on the web site and I don't understand enough about XML,  
> filenames,  or markups to know if the information given means it  
> can be used with Wordsmith. The ANC publisher has not gotten back  
> to me (just so you know I've done my home!).

Sorry that this response is so long in coming. I assume by the "ANC  
publisher" you mean LDC, which would not have the answer to this  
question. Please send inquiries to anc at cs.vassar.edu, not to LDC.

The answer is "yes" concerning WordSmith. Once you have the ANC,  
download the ANCTool (go to http://americannationalcorpus.org/tools/ 
anctool.html--the link to it is on the main ANC web page) and run it,  
at which point you can choose the parts of the corpus you want to use  
as well as the output format. One of the options in the ANCTool is  
for the data to be output in a format for input to WordSmith (see  
"The WordSmith tab" on the tool web page).  Other options in the  
downloadable version of the tool are MonoConc input format and XCES.

We have a new version of the tool which provides other formats as  
well as a mechanism for specifying your own output format, is "schema  
aware" to enable greater control over the output, and provides  
several options for handling overlapping hierarchies. This new  
version,  together with another 20 million words of data, annotations  
for several syntactic analyses, and some manually produced WordNet  
annotations will be made available if and when the ANC project finds  
the funding to enable us to resume activity.

=======================================================
Nancy Ide

Professor and Chair
Department of Computer Science
Vassar College
Poughkeepsie, New York 12604-0520
USA

tel: (+1 845) 437 5988
fax: (+1 845) 437 7498
email: ide at cs.vassar.edu
http://www.cs.vassar.edu/~ide
=======================================================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060823/68d4523b/attachment.htm>


More information about the Corpora mailing list