[Corpora-List] Multiple category assignement

John F Sowa sowa at bestweb.net
Mon Aug 26 19:16:33 UTC 2013


On 8/25/2013 10:55 AM, Aliabbas Petiwala wrote:
> So should such multiple categories be represented as bitstrings , such
> that for n categories there would be a whopping 2^n assignments ? This
> would surely make the inter annotator agreement (IAA) scores very low
> for minor differences.

You might consider Formal Concept Analysis (FCA), which automatically
derives lattices from such bit strings.  For references, software, and
demos, see the FCA home page:

    http://www.upriss.org.uk/fca/fca.html

For examples, type any word to the demo for Roget's Thesaurus:

    http://www.ketlab.org.uk/roget.html

This will generate a small lattice of terms from Roget's Thesaurus
to display the "concept neighborhood" of the word you submit.

You can try submitting the same words to the WordNet demo to see the
differences in concept neighborhoods they generate:

    http://www.ketlab.org.uk/wordnet.html

If you represent annotations by bit strings that represent features or
attributes, two strings that have minor differences will represent
different "concepts", but they will have a common generalization in
the lattices.

Depending on your application, this property might be an advantage
rather than a disadvantage.

John


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list