Christian, thanks for mentioning the Linguist's Search Engine. This was fun to build, and it seems like some people found it useful for a while, but after a number of years the main implementer (a very talented undergraduate!) moved on to other opportunities. So, as it turned out, did the project's funding. :)<br>
<br>The "query by example" concept we came up with was one of my favorite aspects of that project. I have a lot of respect for more sophisticated query interfaces, and I think they're an important component for "power users" of linguistic search, but I'm still convinced that something like a QBE interface is needed if we want to change the way that ordinary non-technology-oriented, non-corpus-oriented linguists interact with what we CORPORA folks call data. The motto for the Linguist's Search Engine, borrowed from the movie <i>Field of Dreams</i>, was "If you build it, they will come." Ok, that turned out to be a bit too ambitious, but I think it was well worth trying.<br>
<br>For what it's worth, one of my main lessons learned in this project is that, in order for something like this to truly succeed, it will be very important for a nimble programmer to listen to feedback from users and respond quickly with modifications to the interface and the underlying functionality. We didn't really have the resources to be responsive in that way, and as a result the LSE remained relatively static once the core functionality was in place. If I were going to do it over again, I think I would push things in the open source direction from the start and try to build critical mass with an active developer community, not just an active user community. <br>
<br>The LSE software is available for download at the project Web site, <a href="http://lse.umiacs.umd.edu">http://lse.umiacs.umd.edu</a>. It's a real mishmash of components. And a lot has changed -- for example, some of the "build your own corpus" functionality piggybacked on Altavista's advanced search capabilities, which I believe no longer exist. But there still may be some useful things there to work with, or at least to inspire your thinking. On the back end there's some really cool infrastructure for flexibly configuring annotator workflows, and for maintaining a constantly running collection of annotation processes that automatically kick in, in the right order, when new data shows up. Thanks to a lot of self-monitoring cron jobs built into the infrastructure, this thing kept itself alive and running for an astonishingly long time after my programmer handed it over, with fairly minimal intervention from me. And of course the query-by-example UI is there, with full tgrep2 for advanced search capabilities, along with a clever indexing and retrieval scheme so that tgrep-style searches don't require linear processing in the size of the corpus. I'd be delighted if Albretch or someone else wants to pick up these ideas and turn them into something new and useful.<br>
<br>Best regards,<br><br> Philip<br><br><br>Philip Resnik, Professor<br>Department of Linguistics and Institute for Advanced Computer Studies<br>University of Maryland<br><a href="http://umiacs.umd.edu/~resnik/">http://umiacs.umd.edu/~resnik/</a><br>
<br><br><br><div class="gmail_quote">On Sun, Nov 28, 2010 at 7:00 PM, Albretch Mueller <span dir="ltr"><<a href="mailto:lbrtchx@gmail.com">lbrtchx@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hi Christian et al,<br>
~<br>
> Sounds a bit like the Linguist's Search Engine (<a href="http://lse.umiacs.umd.edu/" target="_blank">http://lse.umiacs.umd.edu/</a>) ...<br>
~<br>
I checked their project and it is, in what pertains to the QBE<br>
functionality, what I have in mind. Do you know why did they phase out<br>
that project?<br>
~<br>
<div class="im">> Maybe somebody knows about a successor implementation?<br>
</div>~<br>
I (am a professional programmer and I) could restudy and if better<br>
than restarting from scratch, I could take myself over that project<br>
(Of course if it is OK with them ...)<br>
~<br>
> ... the TigerNavigator ... a methodology to use ontologies (OWL/RDF) ...<br>
~<br>
I checked out their search interface to the TIGER corpus (I found odd<br>
that searching for "gehen" doesn't naturally give you as result also<br>
"gegangen"), but honestly I don't believe in OWL/RDF description of<br>
ontologies/Semantic Web things in corpora research. At least not in<br>
the way I think of corpora which are essentially syntactic beasts<br>
~<br>
What is the point of using previously "described" so-called<br>
"semantic" information? I think corpora should just implement as a<br>
linguistic tool to query source data/texts.<br>
~<br>
How on earth could you query texts semantically?<br>
~<br>
> ... ontology-based machine<br>
~<br>
<div class="im">> The main difference is that the user has to provide a whole set of instances and counterinstances for the concept (s)he's looking for.<br>
</div>~<br>
;-)<br>
~<br>
Thanks<br>
lbrtchx<br>
<div><div></div><div class="h5"><br>
_______________________________________________<br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br>
</div></div></blockquote></div><br>