[Corpora-List] Searching Japanese corpora
Eric J. M. Smith
eric.smith at utoronto.ca
Wed Dec 20 23:35:09 UTC 2006
Greetings,
Following up on our recent thread about grep with Unicode, I'm curious
about how people search for text in Japanese-language corpora.
My understanding of Japanese is rudimentary, but is it not possible
(potentially at least) for the same text to be written in hiragana,
katakana, or kanji? In order to find all occurrences of a particular
string in a corpus, would I have to do the search 3 times, once for
each script? I assume that would be the case for something like grep.
But are there more sophisticated query tools which abstract away the
question of which script is actually used for data within the corpus?
Thanks,
Eric J. M. Smith
Dept. of Linguistics
University of Toronto
More information about the Corpora
mailing list