[Corpora-List] Looking for a corpus management tool
Shuly Wintner
shuly at cs.haifa.ac.il
Wed Oct 14 11:08:18 UTC 2009
Hi,
We're developing a large, diverse corpus of (written) Modern Hebrew,
along with morphological processing tools. In order to facilitate
access to the corpus, and in particular to make it usable for
linguistic research, we're looking for a corpus management tool that
supports as many as possible of the following features:
- Ability to store and index millions of texts, hundreds of millions
of tokens
- Ability to process morphologically analyzed text where each token
can be associated with multiple analyses, each consisting of
structured complex information (currently in XML, can be converted to
SQL)
- Ability to handle UTF8-encoded data, right-to-left script
- GUI for searching the corpus based on meta-information, tokens,
lemmas, morphological information, etc.
- More information retrieval options than concordance (e.g., retrieve
collocations, compute mutual information measures, etc.)
- Open source or freely available
- Easy to maintain (specifically, add texts, change morphological
annotation, add search options)
If you reply to me I'd be happy to send a summary of the responses to
the List. Many thanks for your help,
Shuly
--
Shuly Wintner
Dept. of Computer Science, University of Haifa, 31905 Haifa, Israel
Phone: +972 (4) 8288180 Fax: +972 (4) 8249331
shuly at cs.haifa.ac.il http://cs.haifa.ac.il/~shuly
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list