[Corpora-List] Grep for Windows

Trond Trosterud trond.trosterud at hum.uit.no
Sun Dec 17 19:08:22 UTC 2006


Rob Malouf kirjoitti 15. des. 2006 kello 19.06:
>>
>> Besides, none of the standard grep implementations that I know of  
>> handle Unicode (at least not in any useful way).
> Gnu grep 2.5.1 supports Unicode, though I guess it's debatable just  
> how useful it is.  The next version is supposed to be much better  
> on that front.

Well, it is useful for anyone working with languages not covered by  
ASCII or the ISO/IEC code tables. Myself working with North Sámi, I  
would have had a hard time without a working grep command, but 2.5.1  
works fine for multi-byte Unicode text, and is just what I need.

Some unix commands (rev, etc.) do not handle multibyte characters,  
but grep does. Based on my expericences on the Mac, I would recommend  
using a recent bash shell (3.00) (older shells have BCKSP deleting  
only one byte at a time, cumbersome with Unicode).

As for the initial question to the thread, I would suggest using  
putty.exe or something similar and having the students connected to a  
linux box.

But yes, there are situations when people will insist on using  
Windows, so if anyone has a reference to a set of DOS commands  
duplicating the top-ten-or-so unix commands for text processing (if  
something along that line exists), it would be nice to know.


Trond.



----------------------------------------------------------------------
Trond Trosterud                                        t +47 7764 4763
Institutt for språkvitskap, Det humanistiske fakultet  m +47 950 70140
N-9037 Universitetet i Tromsø, Noreg                   f +47 7764 5216
Trond.Trosterud (a) hum.uit.no          http://www.hum.uit.no/a/trond/
----------------------------------------------------------------------



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20061217/136ee6f8/attachment.htm>


More information about the Corpora mailing list