Corpora: testing association strength between elements of trigrams

Gabriel Pereira Lopes gpl at di.fct.unl.pt
Wed Feb 16 01:22:24 UTC 2000


We have done for n-grams with better results that contrast with the ones obtained
by  Dunning. See:

J.F.Silva, G. Dias, S. Guilloré, J.G.P. Lopes. 1999. "Using LocalMaxs Algorithm
for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units". In:
P. Barahona (ed.) Progress in Artificial Intelligence: 9th Portuguese Conference
on AI, EPIA'93, Évora Portugal September 1999, Proceedings. Lectures Notes in
Artificial Intelligence, Springer-Verlag, Vol. 1695, p. 113-132 (1999).

J.F. da Silva, and J.G.P.Lopes. 1999." Extracting Multiword Terms from Document
Collections". In Proceedings of the VExTAL: Venezia per il Trattamento Automatico
delle Lingue, November 22-24, 1999

J.F. da Silva, J.G.P.Lopes, M. F. Xavier, and G. Vicente. 1999. "Relevant
Expressions in Large Corpora". In Anne Condamines, Cécile Fabre et Marie-Paule
Péry-Woodley (eds.) Actes de l'atelier "Corpus et Traitement Automatique des
Langues: Pour une réflexion méthodologique" (TALN'99) , Institut d´Etudes
Scientifiques, Cargèse, Corse (France), July 12-17}. Pp. 86-94. Published by ATALA

J.F. da Silva, and J.G.P.Lopes. 1999. "A Local Maxima method and a Fair Dispersion
Normalization for extracting multi-word units from corpora". In : Proceedings of
the Sixth Meeting on Mathematics of Language (MOL6) , Orlando, Florida July 23-25,
1999. pp. 369---381



Gael Dias, Sylvie Guilloré, José Gabriel P. Lopes. 1999. "The Multilingual Aspects
of Multiword Lexical Units". In: Spela Vintar (ed.) Proceedings of the Language
Technologies Workshop, organized in the framework of the 32nd Annual Meeting of
the Societas Linguistica Europea (SLE99), Arts Faculty, University of Ljubliana,
Lubljiana, Slovenia, July 8-11, 1999}. pp. 11-21.ISBN 961-227-003-1

DIAS, Gaël; Guilloré, Sylvie; Lopes, Gabriel (2000). Normalisation of
Association Measures for Multiword Lexical Unit Extraction. In
"International Conference on Artificial and Computational Intelligence
for Decision, Control and Automation in Engineering and Industrial
Applications", Monastir, Tunisia.

DIAS, Gaël; Guilloré, Sylvie; Lopes, Gabriel (2000). Extraction
Automatique d'Associations Textuelles à partir de Corpora non Traités.
In
JADT 2000 : 5es Journées Internationales d'Analyse Statistique des
Données Textuelles, Lausanne, Suisse.

DIAS, Gaël; Guilloré, Sylvie; Lopes, Gabriel (1999): "Language
Independent Automatic Acquisition of Rigid Multiword Units from
Unrestricted Text corpora", Actes Traitement Automatique des Langues
Naturelles. Institut d'Etudes Scientifiques, Cargèse, France.

DIAS, Gaël; Guilloré, Sylvie; Lopes, Gabriel (1999): "Multilingual
Aspects of Multiword Lexical Units", Actes Workshop on Language
Technologies, Ljubljana, Slovenia.

DIAS, Gaël; Guilloré, Vintar Spela; Sylvie; Lopes, Gabriel (1999):
"Identifying and Integrating Terminologically Relevant Multiword Units
in the IJS-ELAN Slovene-English Parallel Corpus", Actes 10th CLIN,
Utrecht Institute of Linguistics OTS.

DIAS, Gaël; Guilloré, Sylvie; Lopes, Gabriel (1999): "Mutual
Expectation: a Measure for Multiword Lexical Unit Extraction", Actes
VExTAL Venezia per il Trattamento Automatico delle Lingue, Universitá Cá
Foscari, Venezia.

DIAS, Gaël; Guilloré, Sylvie; Lopes, Gabriel (1999): "Multiword Lexical
Units Extraction", Actes International Symposium on Machine Translation
and Computer Language Information Processing, Beijing, China.


Best regards,

Gabriel Pereira Lopes

John Colby wrote:

> I would like to use likelihood ratios, as has been done in Dunning[1993]
> for bigrams, to test the amount of association between the elements of
> trigrams.  Dunning did this for a bigram AB by determining if the distribution
> of A given that B is present is the same as A given that B is not present.
>
> To do something similar for trigrams, is it sufficient to determine for
> a trigram ABC if the distribution of A given the presence of B and C is
> the same as the distribution of A given that both B and C are not present?



More information about the Corpora mailing list