[Corpora-List] A tool for corpus management?

Janne Bondi Johannessen jannebj at iln.uio.no
Thu Aug 26 13:14:28 UTC 2010


The Glossa system supports Unicode and 100 million words.  (It should be
mentioned that it uses, at the moment,  CWB under the surface, but the
encoding is independent of the character encoding in CWB.)
Janne

2010/8/26 Mahdi Mohseni <mohseni48 at gmail.com>

> Thanks to all.
>
> Are these tools supports Unicode texts?
> And another problem: the corpus has up to 100 million words. So, are these
> tools manage this volume of texts easily (especially in search and
> retrieval)?
>
> I appreciate your response.
> Mahdi
>
>
> On Wed, Aug 25, 2010 at 3:36 PM, Mahdi Mohseni <mohseni48 at gmail.com>wrote:
>
>> Dear Colleagues,
>>
>> I need a tool for managing a corpus with the following capabilities:
>>
>>    - Adding text files to the corpus
>>    - Editing files
>>    - Annotating words
>>    - Searching
>>    - Reporting statistics of words and tags
>>
>> Would you please introduce me a suitable tool?
>>
>> Best,
>> Mahdi Mohseni
>>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
Janne Bondi Johannessen
Professor, The Text Laboratory, ILN, http://www.hf.uio.no/tekstlab/
President, NEALT, http://omilia.uio.no/nealt/
University of Oslo
P.O.Box 1102 Blindern, N-0317 Oslo, Norway
Tel: +47 22 85 68 14, mob.: +47 928 966 34
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100826/25ca2d47/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list