6.1165, Sum: Processing Japanese

The Linguist List linguist at tam2000.tamu.edu
Mon Aug 28 17:41:42 UTC 1995


---------------------------------------------------------------------------
LINGUIST List:  Vol-6-1165. Mon Aug 28 1995. ISSN: 1068-4875. Lines:  170
 
Subject: 6.1165, Sum: Processing Japanese
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu>
 
Associate Editor:  Ljuba Veselinova <lveselin at emunix.emich.edu>
Assistant Editors: Ron Reck <rreck at emunix.emich.edu>
                   Ann Dizdar <dizdar at tam2000.tamu.edu>
                   Annemarie Valdez <avaldez at emunix.emich.edu>
 
Software development: John H. Remmers <remmers at emunix.emich.edu>
 
Editor for this issue: dizdar at tam2000.tamu.edu (Ann Dizdar)
 
---------------------------------Directory-----------------------------------
1)
Date:  Sat, 26 Aug 1995 20:01:49 BST
From:  a2226944 at athena.rrz.uni-koeln.de ("Christian Kissing")
Subject:  SUM: Processing Japanese
 
---------------------------------Messages------------------------------------
1)
Date:  Sat, 26 Aug 1995 20:01:49 BST
From:  a2226944 at athena.rrz.uni-koeln.de ("Christian Kissing")
Subject:  SUM: Processing Japanese
 
A couple of weeks ago I asked the list for help on Japanese corpora,
displaying Japanese text in MS-WINDOWS, and last but not least processing
Japanese data in general.
 
The handful of responses showed that the usual US/European PC-environment is
not fit for handling the complexity of Japanese writing systems. Thanks for
answering:
 
Eleanor Olds Batchelder         eobgc at cunyvm.cuny.edu
Igor Gazdik                     igor.gazdik at mailbox.swipnet.se
Peter Hendriks                  phendrik at facstaff.wisc.edu
Takahiro Ioroi                  QZG03762 at niftyserve.or.jp
Hiroshi Nara                    HNARA at vms.cis.pitt.edu
Steeve Seegmiller               SEEGMILLER at apollo.montclair.edu
Carsten Steins                  steins at ling.uni-duesseldorf.de
Noriko Watanabe                 wnoriko at darkwing.uoregon.edu
 
 
 
 
PROCESSING JAPANESE DATA/DISPLAYING TEXT
========================================
The major problem about working with Japanese data on non-Japanese PCs is the
vast amount of signs Japanese makes use of. While the standard international
character-sets represent each character through one byte Japanese charsets
use two bytes per character - you type two keys on the keyboard to get one
sign on the screen. This means that you cannot simply install a Japanese
font, since you'd only have access to the first 256 characters. Besides using
word processors like JWP or NJSTAR that solve this problem internally - both
shareware and available over the internet -, there is the possibility to use
a specially adapted version of DOS, DOS/V, or an addition to DOS, TWINBRIDGE,
both of which can switch from US- to Japanese mode together with a Japanese
edition of WIN 3.1. This would leave the opportunity to run (native) Japanese
DOS-software, but if you feel you can do with WINDOWS alone, you can install
WIN/V on top of your US-WINDOWS not touching DOS. It has the disadvantage of
being delivered without Japanese TrueType-fonts, but the Japanese edition of
MS-OFFICE for example comes with a set.
 
WORDPERFECT for WINDOWS has a Kana-font that can be inserted via the special
characters-dialogue. They are TrueType-fonts and have to be installed in the
WINDOWS font-setup. I have tried them in another word processor where they
displayed on screen, but funnily enough didn't print.
 
MACs don't seem to have any of these problems at all. Most of the
Japanologists we talked to use MACs with a JAPANESE LANGUAGE KIT.
TrueType-fonts for the MAC can be found at
http://babel.uoregon.edu/yamada/fonts/japanese.html, and on the page
http://babel.uoregon.edu/yamada/fonts.html they point to a whole variety of
multinational fonts (even some for PC, but bitterly enough no Jap. ones).
 
I'm not going to go deeper into the matter here, but if you want find out
more, here's a couple of addresses:
 
The most concise but still detailed information came from what was the
sci.lang.japan-faq, which is now outdated by a WWW-Version at (!NEW ADDRESS!)
http://www.mickey.ai.ac.kyutech.jp/cg-bin/japanese/. Similar, but not as
detailed: http://www.uwtc.washington.edu/computing/Japanese/DOSWindows.html.
FAST RIVER SYSTEMS - the manufacturer of WIN/V have their own Webpage at
http://www.gol.com/winv/winvhome.html
 
There are some CD-ROMs and books out on the theme: O'Reilly & Associates have
published a book by Ken Lunde, UNDERSTANDING JAPANESE INFORMATION-PROCESSING
(1993) which comes with a diskette full of tools for various computer
platforms. Walnut Creek have published a CD-ROM, EAST ASIAN TEXT PROCESSING,
containing tools etc. for Japanese, Chinese, and Korean. Finally there seems
to be some "Nikkei CD-ROM Kensaku Tools", but we weren't able to read the
information, since it was in Japanese typing. It contained the strings:
http://tokunaga-www.cs.titech.ac.jp/Nikkei/Nikkei-home and
http://catctus.aist-nara.ac.jp/lab/resource/resource.html, but I couldn't
access these either.
 
Finally there is a discussion list similar to the LINGUIST: "JTIT (Japanese
Teachers and Instructional Technology). Subscription adress is
listserv at psuvm.psu.edu, the body of the message reads "sub jtit-l
yourfullname" (without quotation marks). Postings to the list go to
JTIT-L at psuvm.psu.edu.
 
 
 
WORDLISTS/CORPORA
=================
Information about wordlists or corpora hasn't been as broad as the technical
stuff. First there is one we already knew about:
Kokuritu Kokugo Kenkyuu Jo (The National Language Research
        Institute), 1962: Gendai zassi kyuuzyuusyu no yougo
        youji (Vocabulary and Chinese Characters in Ninety
        Magazines of Today). Vol. I, Tokyo.
 
There's more volumes, but I can't say how many, and alltogether it's not very
recent. More promising, but very expensive are yearly CD-ROM editions of
several Japanese Newspapers, costs have been reported to me as being as high
as an average month's income in Japan:
 
> ---------------------------------
> A. Hihon Keizai Shinbun CD-ROM
> 1990,1991,1992,1993,1994:  130,000 yen for each version.  Tax (=3%) not
> included.
> B. Asahi Shinbun Textfile Data Base
>  1985, 1986, 1987: 100,000 yen for each version Tax (=3%) not included
>  1988, 1993: 120,000 for each version  Tax(=3% not included)
>
> For query:
> Sales Department of
> Electronic Media
> Kinokuniya Shoten
> 5-38-1 Sakuraoka, Setagaya-ku, Tokyo 156
>                     tel: 03-3439-0123
>                     fax: 03-3439-1093
>  ------------------------------
 
Two newspapers can be found on WWW, but here the problem is just the same: to
view their pages you need a complete Japanese installation including a
Japanese Web browser (here, too, Netscape is recommended by the maintainers).
URLs are http://www.asahi.com and http://www.yomiuri.co.jp - if you're just
curious about recent news from Japan, they also offer Enlish pages.
 
 
 
That's all I have to offer - I hope it is useful to someone. As said before
some of the information will be lost on us until we get the Japanese
software. Let me say thank you one more time to all the people who have cared
to answer.
Best
Christian Kissing
 
 
*******************************************************************
Christian Kissing
Dept. of Linguistics
Universitaet Duesseldorf                        home:
Universtaetsstrasse 1                           Neusser Strasse 17
 
D-40225 Duesseldorf                             50670 Koeln
 
Tel.: +49+211/311-4797                          0221/779061
Fax.: +49+211/311-5180
 
eMail: kissing at ling.uni-duesseldorf.de
*******************************************************************
------------------------------------------------------------------------
LINGUIST List: Vol-6-1165.



More information about the LINGUIST mailing list