[Corpora-List] Getting articles from newspapers to compile a corpus

Angus Grieve-Smith grvsmth at panix.com
Fri Nov 30 03:35:54 UTC 2012


On 11/29/2012 4:28 PM, Linda Bawcom wrote:
> Because so many newspapers get their information from the same news 
> services, I found a few articles that I had to disgard because of an 
> over 80%  similarity ratio and of course that skews statistics.

     Good point!  Some newspapers will abridge the wire stories more 
than others, so it might be useful to find a way to choose the longest 
version.

-- 
				-Angus B. Grieve-Smith
				grvsmth at panix.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121129/f50908d7/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list