web tool help

Eric Poncet [NunaSoft] tmp at NUNASOFT.COM
Mon Jul 28 14:39:26 UTC 2008


Hi Shannon,

xfst and lexc being tools that are invoked from the command-line of a 
shell, here is a possible architecture (big picture):
- an HTML page with a form offering one (or more) field(s) where users 
input their query. There's also an"Analyze" button to run the analyzer;
- a user enters the URL of this page in his/her Web browser;
- s/he inputs data to analyze and clicks "Analyze";
- this runs a tiny program that invokes the command-line tool with the 
content of the form fields as arguments; in return, this program is able 
to format in HTML the output of the tool;
- this output is sent back to the user's browser.

In order to link your analyzer to your dictionaries, the program 
described above can do searches of roots and morphemes and embed in its 
output information from those dictionaries. It could go one step further 
and add to each found root/morpheme a hyperlink to its dictionary entry, 
a hyperlink to an audio recording, picture... Of course, having a 
machine do a lookup on the dictionaries requires those to have a 
record-based structure (be it a text or binary format).

This is a basic solution that will do what you want... no fancy/flashy 
stuff here! (though it can be done later, as it's all HTML compliant, so 
any graphics designer would jazz it up as you like). PHP Scripting 
language is a good option for writing this tiny program, as its Open 
Source nature and wide-spread use make it a good candidate for "computer 
language preservation" ;-)

For more advanced needs, it might be interesting to get xfst and lexc 
source code and make some adaptations (I have no clue whether Xerox made 
their source code available, nor whether their license would allow any 
modification, though).

Out of curiosity: for what language(s) is this analyzer?

Eric Poncet
CTO
NunaSoft
www.nunasoft.com

s.t. bischoff a écrit :
> Hi all,
>
> I'm working on a morphological analyzer and want to make it available 
> online for testing by the community, however I have no idea how or 
> where to even start. The analyzer uses Xerox's xfst and lexc finite 
> state technology. The idea is that you can put a sentence or 
> morphologically complex word into the analyzer and it will return a 
> morphological analysis, or you can input a morphological analysis and 
> it will return a sentence or complex word. For example you can input
>
> ʔɛčt'uk'ʷipmstup 
>
> and you'll get
>
> cust+on.not.part(loc)+√t'uk'ʷip+m+ct+3abs+2ergpl
>
> or vise versa...input
>
> cust+on.not.part(loc)+√t'uk'ʷip+m+ct+3abs+2ergpl
>
> and get
>
> ʔɛčt'uk'ʷipmstup
>
> What I'd like to do is have folks test it to work out any bugs and 
> decide how best to manage the morphological analysis (that is use 
> linguistic notation or English translations etc.).
> Also, I have a root dictionary and morpheme dictionary that I'd like 
> to make available in a searchable format, both are in a text format. 
> Ultimately, I'd like to have the analyzer linked to the dictionaries 
> so that once a word or sentence is analyzed a search for the root in 
> the dictionary can be done automatically as well as for the other 
> morphemes, returning a gloss with the analyzed form.
>
> Any thoughts or suggestions would be greatly welcome.
>
> thanks,
> shannon



More information about the Ilat mailing list