<head>
<style type="text/css">
<!--
/* start of attachment style */
.ygrp-photo-title{
clear: both;
font-size: smaller;
height: 15px;
overflow: hidden;
text-align: center;
width: 75px;
}
div.ygrp-photo{
background-position: center;
background-repeat: no-repeat;
background-color: white;
border: 1px solid black;
height: 62px;
width: 62px;
}
div.photo-title
a,
div.photo-title a:active,
div.photo-title a:hover,
div.photo-title a:visited {
text-decoration: none;
}
div.attach-table div.attach-row {
clear: both;
}
div.attach-table div.attach-row div {
float: left;
/* margin: 2px;*/
}
p {
clear: both;
padding: 15px 0 3px 0;
overflow: hidden;
}
div.ygrp-file {
width: 30px;
valign: middle;
}
div.attach-table div.attach-row div div a {
text-decoration: none;
}
div.attach-table div.attach-row div div span {
font-weight: normal;
}
div.ygrp-file-title {
font-weight: bold;
}
/* end of attachment style */
-->
</style>
</head>
<html>
<head>
<style type="text/css">
<!--
#ygrp-mkp {
border: 1px solid #d8d8d8;
font-family: Arial;
margin: 10px 0;
padding: 0 10px;
}
#ygrp-mkp hr {
border: 1px solid #d8d8d8;
}
#ygrp-mkp #hd {
color: #628c2a;
font-size: 85%;
font-weight: 700;
line-height: 122%;
margin: 10px 0;
}
#ygrp-mkp #ads {
margin-bottom: 10px;
}
#ygrp-mkp .ad {
padding: 0 0;
}
#ygrp-mkp .ad p {
margin: 0;
}
#ygrp-mkp .ad a {
color: #0000ff;
text-decoration: none;
}
-->
</style>
</head>
<body>
<!-- |**|begin egp html banner|**| -->
<br><br>
<!-- |**|end egp html banner|**| -->
<div dir="ltr">It sounds like Wiktionary has a morphological generator, though I have to say I'm surprised. Is it really capable of handling complex morphology?<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Mon, Apr 28, 2014 at 6:13 PM, Benjamin Barrett <span dir="ltr"><<a href="mailto:benjaminbarrett85@gmail.com" target="_blank">benjaminbarrett85@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
<br><br>
I'm not sure about the parser/generator part. As I said, Wiktionary allows you to write the rules so that when a verb or other POS is entered (with or without irregular forms), pages for each form is generated so the dictionary user can look up any form. That, of course, includes reduplication forms as well. You can see this by entering forms like eaten, vado, 行かない, etc., at <a href="https://en.wiktionary.org/wiki" target="_blank">https://en.wiktionary.org/wiki</a>.<div>
<br></div><div>Unlike situations where a print dictionary is the object, I don't see variations in lexical categories as too critical for the Lushootseed project. The purpose of an online dictionary is widespread, easy access, and while consistency is of course desirable, access to good information is more important. 15 categories expanding to 118 is extreme, though; by monitoring new entries, we can hopefully cut issues like that in the bud by contacting editors and making changes to the entry templates.</div>
<div><br></div><div>As for inconsistencies in entries, by creating templates, those can be reduced, but the open format of Wiktionary is definitely a drawback in that respect. Again, though, I don't view missing fields or inconsistency in field order as primary in importance for this online project. </div>
<div><br></div><div>What I imagine is people learning their heritage language sitting at home and wondering how to say "travel by land," and they pull out their smartphone to get the word and hopefully they memorize the simple sample sentence provided while they're looking at the page.</div>
<div><br></div><div>Ben Barrett</div><div>La Conner, WA</div><div><br></div><div>Learn Ainu! <a href="https://sites.google.com/site/aynuitak1/videos" target="_blank">https://sites.google.com/site/aynuitak1/videos</a></div>
<div><br><div><div><div>On Apr 28, 2014, at 9:20 AM, Bill Poser <<a href="mailto:billposer2@gmail.com" target="_blank">billposer2@gmail.com</a>> wrote:</div><br><blockquote type="cite">
<div style>
<span> </span>
<div><div><br></div><div dir="ltr">As a bit of data in support of Mike's point that it is desirable to validate manually created databases, when I wrote the code to produce print dictionaries from Jonathan Amith's Oapan and Ameyaltepec Nahuatl database, which was in something like the SIL SDF format but not created using Shoebox or Toolbox, I initially found something like 118 lexical categories. This was due to variations in capitalization, choice of abbreviation, and use of both English and Spanish. We ended up with 15 after merging all the variants that had crept in.<br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Apr 28, 2014 at 9:09 AM, Mike Maxwell <span dir="ltr"><<a href="mailto:maxwell@umiacs.umd.edu" target="_blank">maxwell@umiacs.umd.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid">
<u></u>
<div>
<span> </span>
<div>
<div>
<div><div><br></div><div>On 4/28/2014 1:10 AM, Benjamin Barrett wrote:<br>
> For Lushootseed, I think we calculated that with various prefixes, there<br>
> should be less than 120 forms (which is about the Latin count, I think),<br>
> which is a reasonable count. It's nice to have one page for every form<br>
> so people can look up whatever form they have at hand, but if you have a<br>
> language with hundreds of forms per verb, then you might have to<br>
> consider whether you want to pare it down to keep your database small<br>
> (though obviously Wikipedia and Wiktionary are huge).<br>
<br></div>
With a count like that, you probably want a morphological parser/ <br>
generator to create the forms (otherwise you inflate the number of verbs <br>
that you need to enter by two orders of magnitude). FLEx has such a <br>
parser built in.<br>
<br>
A finite state transducer (like xfst/lexc or FOMA, or sfst) allows both <br>
parsing and generation from the same rule set. If you can express the <br>
rules (the morphotactics, plus the phonological rules that create <br>
allomorphs) in the xfst or sfst formalism, and export the lexical <br>
entries from your dictionary, then it's not too hard. With an <br>
appropriate interface to your web page, you can automatically call the <br>
parser on forms the user types in. Dunno if you can do that with <br>
Wiktionary.<br>
<br>
IIRC, Lushootseed has reduplication, although perhaps you've accounted <br>
for that by listing the various reduplicated forms.<br>
<br>
FWIW, I would suggest creating some kind of test program to ferret out <br>
broken lexical entries. With free-form entry like Wiktionary (or <br>
Toolbox), erroneous entries (entries with missing fields, fields in the <br>
wrong order, etc.) are bound to arise.<br><br></div></div></div></div></blockquote></div></div></div></div></blockquote></div></div></div></div>
<br>
<br>
<div width="1" style="color:white;clear:both"></div>
</blockquote></div><br></div>
<!-- |**|begin egp html banner|**| -->
<br>
<br>
<!-- |**|end egp html banner|**| -->
<div width="1" style="color: white; clear: both;"/>__._,_.___</div>
<!-- Start Recommendations -->
<!-- End Recommendations -->
<!-- |**|begin egp html banner|**| -->
<img src="http://geo.yahoo.com/serv?s=97476590/grpId=11682781/grpspId=1709195911/msgId=5566/stime=1398735006" width="1" height="1"> <br>
<!-- |**|end egp html banner|**| -->
<!-- |**|begin egp html banner|**| -->
<br>
<!-- |**|begin egp html banner|**| -->
<div id="ygrp-vital" style="background-color: #f2f2f2; font-family: Verdana; font-size: 10px; margin-bottom: 10px; padding: 10px;">
<span id="vithd" style="font-weight: bold; color: #333; text-transform: uppercase; "><a href="https://groups.yahoo.com/neo/groups/lexicographylist/info;_ylc=X3oDMTJmZTk5MzY5BF9TAzk3MzU5NzE0BGdycElkAzExNjgyNzgxBGdycHNwSWQDMTcwOTE5NTkxMQRzZWMDdnRsBHNsawN2Z2hwBHN0aW1lAzEzOTg3MzUwMDY-" style="text-decoration: none;">Visit Your Group</a></span>
<ul style="list-style-type: none; margin: 0; padding: 0; display: inline;">
<li style="border-right: 1px solid #000; font-weight: 700; display: inline; padding: 0 5px; margin-left: 0;">
<span class="cat"><a href="https://groups.yahoo.com/neo/groups/lexicographylist/members/all;_ylc=X3oDMTJncHUybHRnBF9TAzk3MzU5NzE0BGdycElkAzExNjgyNzgxBGdycHNwSWQDMTcwOTE5NTkxMQRzZWMDdnRsBHNsawN2bWJycwRzdGltZQMxMzk4NzM1MDA2" style="text-decoration: none;">New Members</a></span>
<span class="ct" style="color: #ff7900;">2</span>
</li>
</ul>
</div>
<div id="ft" style="font-family: Arial; font-size: 11px; margin-top: 5px; padding: 0 2px 0 0; clear: both;">
<a href="https://groups.yahoo.com/neo;_ylc=X3oDMTJlYXFpdnJ0BF9TAzk3NDc2NTkwBGdycElkAzExNjgyNzgxBGdycHNwSWQDMTcwOTE5NTkxMQRzZWMDZnRyBHNsawNnZnAEc3RpbWUDMTM5ODczNTAwNg--" style="float: left;"><img src="http://l.yimg.com/ru/static/images/yg/img/email/new_logo/logo-groups-137x15.png" height="15" width="<? ol var!Pref.EmailFooterLogo.FullFeatured.Width ?>" alt="Yahoo! Groups" style="border: 0;"/></a>
<div style="color: #747575; float: right;"> • <a href="https://info.yahoo.com/privacy/us/yahoo/groups/details.html" style="text-decoration: none;">Privacy</a> • <a href="mailto:lexicographylist-unsubscribe@yahoogroups.com?subject=Unsubscribe" style="text-decoration: none;">Unsubscribe</a> • <a href="https://info.yahoo.com/legal/us/yahoo/utos/terms/" style="text-decoration: none;">Terms of Use</a> </div>
</div>
<!-- |**|end egp html banner|**| -->
</div> <!-- ygrp-msg -->
<br>
<!-- |**|end egp html banner|**| -->
<div style="color: white; clear: both;"/>__,_._,___</div>
</body>
</html>