[Corpora-List] querying corpora

Albretch Mueller lbrtchx at gmail.com
Sat Mar 1 16:03:21 UTC 2008


> Particularly problematic are intersecting hierarchies, i.e., tree-shaped analyses on multiple linguistic levels.
~
 I see the importance of keeping multiple layers of annotations in
synch. but I don't entirely grasp why it is so difficult to model
"intersecting hierarchies"
~
 Can you please, illustrate me with some examples of them, so I have a
way to make sure I understood well the particularities of their
problematic?
~
> Due to the fact that OWL DL has been defined in the Resource Description Framework (RDF3) ...
> Our proposal
> We propose that the problems introduced above can be addressed by formalising corpora in an integrated, multi-layered corpus and lexicon model in a declarative logical framework, more specifically, the description logics-based OWL DL formalism.
~
 I was put off by comments like those. But again, it is my bias and I
should give this article a calmer reading
~
>>  I think corpora should use agreed-on POS markers as their addressable
>> units in the same way DBMS use "columns", but they should also let
>> their users get down to the most minimal, not obviously grammatical
>> layer also.
~
> Are you saying that you don't want any recursive structure?
~
 Not exactly. I think, the best way to handle this is by
transactionally keeping different representations of the same data in
sych. so that depending on the query the "query engine" may go one way
or the other. Here are different representations of the same data:
~
 1) The linear "plain" text in its particular encoding
~
 2) The, also linear, "minimal granularity" in a lexical sense
representation for the particular corpus
~
 3) The hierarchical, more morphological parse tree representation
~
 The data structure underlying the query engine should keep in synch
these representations, so that one can be transformed into the other
in a way that they exactly match
~
>>  Well, why is it so hard to come up with an idea of how this "concise
>> syntax" should look like?
~
>Of course that depends on what you think of as "concise"; you have to
>be aware that many people in the field don't have a strong
>formal/mathematical background.
~
 Well, trying to avoid nitpicking some other or your comments in an
endless loop. I totally agree with you. Even if we can't get
"conciseness" out of Linguists at least they should give us clarity of
purpose, they should air clearly what they want, they should think of
it as a wish list to Santa ;-)
~
 One of the problems I see with corpus linguistics is that "corpora"
are different things to different people, which in a sense is good,
"there are many ways to skin a cat" the English saying goes, but the
thing is that we all clearly know what a "cat" is and what "skinning"
means. What is natural to me may not be so to other people
~
>>  Very interesting! Since it is based on XPath and text comprising
>> alphabetical nat. langs are naturally representable through syntax
>> trees. How exhaustive is LPath?
~
>AFAIK it covers individual tree structures fairly well (i.e. you'd be
>able to query syntax trees with it). If you want to combine syntax
>trees with something else that has hierarhcical structure (like
>discourse, semantic roles, maybe even prosody), you need something
>else (which is again what we argue in our paper).
~
 Hmm! Thank you. it didn't come to mind when I read your paper.
However I think that modeling [X|L]Paths as tree like structures
shouldn't be that difficult.
~
 Thanks
 lbrtchx

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list