[Corpora-List] AntConc 3.2.2 released for Windows and Mac OS X

Laurence Anthony anthony0122 at gmail.com
Wed Apr 13 18:10:48 UTC 2011


Hi Mike,

On Thu, Apr 14, 2011 at 2:50 AM, maxwell <maxwell at umiacs.umd.edu> wrote:
> Laurence Anthony <anthony0122 at gmail.com> wrote:
>> Basically, all (pre Win 7?) windows systems had their
>> own legacy encodings, which varied from country to country.
>> So, even if you have a file saved as UTF8, the file *name*
>> is saved in the legacy encoding.
>
> Are you sure?  I thought NTFS filenames were Unicode:
> http://en.wikipedia.org/wiki/Ntfs (see "Allowed characters in filenames")
> http://msdn.microsoft.com/en-us/library/dd317748%28v=vs.85%29.aspx
> --and NTFS superseded the older FAT filesystem as of Windows NT.
>
>   Mike Maxwell
>

It's a good question. I think the underlying OS stores everything as
Unicode but then each system has a locale setting that's set to things
like the legacy ShiftJIS here in Japan. It's also related to the
Windows code page problem. See below:
http://en.wikipedia.org/wiki/Windows_code_page.

So, you never know what the encoding will be when you want to open
files. If anybody has any advice on this, I would be very grateful!
Laurence.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list