[Corpora-List] Query-By-Example (QBE) -like GUI to query corpora ...

P Resnik psresnik at gmail.com
Mon Nov 29 20:11:43 UTC 2010


Christian, thanks for mentioning the Linguist's Search Engine.  This was fun
to build, and it seems like some people found it useful for a while, but
after a number of years the main implementer (a very talented
undergraduate!) moved on to other opportunities.  So, as it turned out, did
the project's funding. :)

The "query by example" concept we came up with was one of my favorite
aspects of that project.  I have a lot of respect for more sophisticated
query interfaces, and I think they're an important component for "power
users" of linguistic search, but I'm still convinced that something like a
QBE interface is needed if we want to change the way that ordinary
non-technology-oriented, non-corpus-oriented linguists interact with what we
CORPORA folks call data.  The motto for the Linguist's Search Engine,
borrowed from the movie *Field of Dreams*, was "If you build it, they will
come."   Ok, that turned out to be a bit too ambitious, but I think it was
well worth trying.

For what it's worth, one of my main lessons learned in this project is that,
in order for something like this to truly succeed, it will be very important
for a nimble programmer to listen to feedback from users and respond quickly
with modifications to the interface and the underlying functionality.   We
didn't really have the resources to be responsive in that way, and as a
result the LSE remained relatively static once the core functionality was in
place.  If I were going to do it over again, I think I would push things in
the open source direction from the start and try to build critical mass with
an active developer community, not just an active user community.

The LSE software is available for download at the project Web site,
http://lse.umiacs.umd.edu.  It's a real mishmash of components.  And a lot
has changed -- for example, some of the "build your own corpus"
functionality piggybacked on Altavista's advanced search capabilities, which
I believe no longer exist.  But there still may be some useful things there
to work with, or at least to inspire your thinking.  On the back end there's
some really cool infrastructure for flexibly configuring annotator
workflows, and for maintaining a constantly running collection of annotation
processes that automatically kick in, in the right order, when new data
shows up.  Thanks to a lot of self-monitoring cron jobs built into the
infrastructure, this thing kept itself alive and running for an
astonishingly long time after my programmer handed it over, with fairly
minimal intervention from me.  And of course the query-by-example UI is
there, with full tgrep2 for advanced search capabilities, along with a
clever indexing and retrieval scheme so that tgrep-style searches don't
require linear processing in the size of the corpus.  I'd be delighted if
Albretch or someone else wants to pick up these ideas and turn them into
something new and useful.

Best regards,

  Philip


Philip Resnik, Professor
Department of Linguistics and Institute for Advanced Computer Studies
University of Maryland
http://umiacs.umd.edu/~resnik/



On Sun, Nov 28, 2010 at 7:00 PM, Albretch Mueller <lbrtchx at gmail.com> wrote:

>  Hi Christian et al,
> ~
> > Sounds a bit like the Linguist's Search Engine (
> http://lse.umiacs.umd.edu/) ...
> ~
>  I checked their project and it is, in what pertains to the QBE
> functionality, what I have in mind. Do you know why did they phase out
> that project?
> ~
> > Maybe somebody knows about a successor implementation?
> ~
>  I (am a professional programmer and I) could restudy and if better
> than restarting from scratch, I could take myself over that project
> (Of course if it is OK with them ...)
> ~
> > ... the TigerNavigator ... a methodology to use ontologies (OWL/RDF) ...
> ~
>  I checked out their search interface to the TIGER corpus (I found odd
> that searching for "gehen" doesn't naturally give you as result also
> "gegangen"), but honestly I don't believe in OWL/RDF description of
> ontologies/Semantic Web things in corpora research. At least not in
> the way I think of corpora which are essentially syntactic beasts
> ~
>  What is the point of using previously "described" so-called
> "semantic" information? I think corpora should just implement as a
> linguistic tool to query source data/texts.
> ~
>  How on earth could you query texts semantically?
> ~
> > ... ontology-based machine
> ~
> > The main difference is that the user has to provide a whole set of
> instances and counterinstances for the concept (s)he's looking for.
> ~
>  ;-)
> ~
>  Thanks
>  lbrtchx
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101129/1b413a76/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list