<div>John and Kilian,</div><div>Thanks a lot for your replies. I think for now I'll just go for the simple txt but I'll definitely look into the json for future updates (for now I'm just working on the basic n-grams, but I'm researching more advanced models).</div>
<div>Thanks,</div><div>Hans</div><div> </div><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
<span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">On 5/12/2011 7:48 AM, Kilian Evang wrote:</span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "> <br>
</span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">>></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><i> I was thinking just to give them as tab separated txt files as that <br>
</i></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">>></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><i> seems the most universal, e.g. something like: <br>
</i></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">>></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><i> <br>
</i></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">>></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><i> how[tab]are[tab]54 <br>
</i></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><i> <br>
</i></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><i> I think that's a good idea. Google's huge n-gram corpus is also released <br>
</i></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><i> in this format (though I'm not sure if they use tabs or spaces): <br>
</i></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">CSV (Comma Separated Values) is a format for txt files since prehistoric times (i.e., before the Internet).<br>
</span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">But context info might need more complex structures than just CSV. A widely used format is JSON, which is the next step up beyond CSV. For example a list of items would be represented:<br>
</span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">[a, b, c, d, e, f, g]<br></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">A list of tagged items would be represented:<br>
</span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">{tag1: a, tag2: b, tag3, c}<br></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">And it's possible to nest these two formats arbitrarily deep. That might be useful for contexts that contain other contexts.<br>
</span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; ">The syntax for JSON is expressed on the first page of the JSON web site:<br></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; font-size: medium; "><a href="http://www.json.org/">http://www.json.org<br>
</a></span>John</blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
</blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
</blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
</blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"></blockquote><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;">
</blockquote><p style="font-family: 'Times New Roman'; font-size: medium; "></p><p style="font-family: 'Times New Roman'; font-size: medium; "></p><p style="font-family: 'Times New Roman'; font-size: medium; ">
</p>