SV: [Corpora-List] Information about content analysis software

Santos Diana Diana.Santos at sintef.no
Thu Mar 23 21:08:59 UTC 2006


Dear Flávio,

Corpógrafo (www.linguateca.pt/Corpografo), developed by the Porto node of Linguateca (Belinda Maia, Luís Sarmento, Ana Sofia Pinto, Luís Miguel Cabral and others) is a system that processes Portuguese (and several other languages as well) and has a lot of the functions you require.

It is more general than InXight in that it was designed to discover terms (mainly NPs with a common noun head)
and not only named entities. However, it has no summarization capabilities.

Even though it was initially developed for terminology teaching purposes (and we have currently more than 600 users around the world) we are now extending it to encompass functionalities more like the ones you mention, namely making Corpógrafo of help in developing ontologies from text and visualizing them, as well as in semi-automatically discovering definitions.

See our paper in LREC this year for more information:

Luís Sarmento, Belinda Maia, Diana Santos, Ana Pinto & Luís Cabral. "Corpógrafo V3: From Terminological Aid to Semi-automatic Knowledge Engine". to appear in Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006 ) (Genoa, Italy, 22-28 May 2006),  http://www.linguateca.pt/Diana/download/SarmentoetalLREC2006.pdf

Gretings,
Diana
---------------
Diana Santos
www.linguateca.pt
Linguateca, Oslo node, SINTEF ICT
Pb 124 Blindern, N-0314 Oslo, Norway



________________________________

Fra: owner-corpora at lists.uib.no på vegne av Flávio Barbosa
Sendt: fr 17.03.2006 18:14
Til: corpora at uib.no
Emne: [Corpora-List] Information about content analysis software


My name is Flávio. I work at Research and Documentation Managment in MULTIRIO (www.multirio.rj.gov.br), an entity created by the Municipal Government of Rio de Janeiro with the purpose of enhancing education and cultural understanding by creating, producing and broadcasting information via TV, press and the Web.
We'd like to get a recommendation of content analysis softwares that satisfy the following needs (we've already foud some options, like Tropes and Inxight, but the research report should present other possibilities):

1) It should process information in portuguese (and other major languages) --- this is indispensable;
2) it should process mul timedia material;
3) it should process files in different text editors formats, as well as pdf, html etc.;
4) it should summarize automatically text content;
5) it should process different text extensions (not only words or expressions, but the meaning of larger text extensions);
6) it should be possible to visualize results graphically, with varied visualization options;
7) it should br possible to freely create semantic categories for content extraction.

Thanks for your help. If you know a software that doesn't sati sfy all the necessities above, but the majority of them, We'd also be grateful to have this information.
-----
Flávio Barbosa (flaviobarbosa at rio.rj.gov.br), researcher
MultiRio -- Empresa Municipal de Multimeios
Research and Documentation Managment
Phones: 55 21 2528-8258
              55 21 2528-8244

________________________________

Yahoo! Acesso Grátis 
Internet rápida e grátis. Instale o discador agora! <http://us.rd.yahoo.com/mail/br/tagline/homepage_set/*http://br.acesso.yahoo.com> 



More information about the Corpora mailing list