[Corpora-List] A tool for corpus management?

Alberto Simões albie at alfarrabio.di.uminho.pt
Thu Aug 26 14:28:18 UTC 2010


On 26/08/2010 13:25, Sérgio Matos wrote:
> I would suggest NooJ (nooj4nlp.net).
> It's based on Finite-State methods, and implemented in C, so I'd expect
> very good performance.

Implemented in C# making it hard to use on Unix based machines.
But a relevant tool, in any case.

Cheers
> 
> Regards,
> Sérgio
> 
> 
> 
> 
> On 08/26/2010 12:04 PM, Mahdi Mohseni wrote:
>> Thanks to all.
>>
>> Are these tools supports Unicode texts?
>> And another problem: the corpus has up to 100 million words. So, are
>> these tools manage this volume of texts easily (especially in search
>> and retrieval)?
>>
>> I appreciate your response.
>> Mahdi
>>
>> On Wed, Aug 25, 2010 at 3:36 PM, Mahdi Mohseni <mohseni48 at gmail.com
>> <mailto:mohseni48 at gmail.com>> wrote:
>>
>>     Dear Colleagues,
>>
>>     I need a tool for managing a corpus with the following capabilities:
>>
>>         * Adding text files to the corpus
>>         * Editing files
>>         * Annotating words
>>         * Searching
>>         * Reporting statistics of words and tags
>>
>>     Would you please introduce me a suitable tool?
>>
>>     Best,
>>     Mahdi Mohseni
>>       
>>
>>
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>   
> 
> 
> 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- 
Alberto Simões

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list