[Corpora-List] Comparing files

Lluís Padró padro at lsi.upc.es
Mon Nov 17 08:54:29 UTC 2003


>I'm doing a project that involves comparing two very large word lists (~40.000 and 70.000 words). What I need to find out, is which words are on one list and not on the other (and/or vice versa).
>Can anyone give me a hint as to how to do this? (I was thinking; maybe a perl script?)
>
>

  sort list1 > list1.sorted
  sort list2 > list2.sorted
  join -v1 list1.sorted list2.sorted

  (if you use -v2 instead, you'll get words in list2 and not in list1)

       best
--
------------------------------------------------------------------------
* Lluís Padró i Cirera * UNIVERSITAT POLITÈCNICA DE CATALUNYA
*Departament de Llenguatges i Sistemes Informàtics <http://www.lsi.upc.es>*
*Centre de Recerca TALP <http://www.talp.upc.es>*
Tel: XX-34-934 015 652
Fax: XX-34-934 017 014
padro at lsi.upc.es <mailto:padro at lsi.upc.es>
http://www.lsi.upc.es/~padro <http://www.lsi.upc.es/%7Epadro> Mòdul C6 -
Campus Nord
Jordi Girona Salgado 1-3
08034 Barcelona

------------------------------------------------------------------------



More information about the Corpora mailing list