A good starting pointer for writing such a parser quickly can be UIMA (IBM's Unstructured Information Management Architecture) example code (Java) on recognizing dates, time, room numbers in many different formats. Based on regular expressions and patterns.
<br><br>Thanks,<br><br>Saurav Sahay<br><br>PhD Student,<br>College of Computing,<br>Georgia Tech<br><a href="http://www.cc.gatech.edu/~ssahay">www.cc.gatech.edu/~ssahay</a><br> <br><div><span class="gmail_quote">On 10/15/07,
<b class="gmail_sendername">Kevin B. Cohen</b> <<a href="mailto:kevin.cohen@gmail.com">kevin.cohen@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Nate,<br><br>Speaking as someone who wrote one of these at MapQuest, I can tell you<br>that it's entirely possible to do.<br><br>NDAs limit what I can tell you about the specific approach, but<br>without releasing any industrial secrets I can certainly tell you that
<br>if you're not offended by writing lots of special-case rules, it's not<br>difficult to do. It will take you longer to put together a thorough<br>set of test cases than it will to write the code, but once you've put
<br>together a good set of test cases, it will probably be pretty obvious<br>to you what your code needs to do.<br><br>Kev<br><br>On 10/15/07, Nate Blaylock <<a href="mailto:nblaylock@ihmc.us">nblaylock@ihmc.us</a>> wrote:
<br>> Hi all,<br>><br>> I am looking for a free (as in gratis) free-form mailing address parser<br>> -- i.e., something which will take a free-form address like:<br>> 123 Maple St. #1025<br>> Springfield, IL 12345
<br>><br>> and return the constituent parts:<br>> Street: Maple St.; House number: 123; Apartment number: 1025 City:<br>> Springfield; State: IL; Zipcode: 12345<br>><br>> Something that worked for any international address would be great, but
<br>> I am most interested in coverage of US addresses.<br>><br>> thanks,<br>><br>> nate<br>><br>> _______________________________________________<br>> Corpora mailing list<br>> <a href="mailto:Corpora@uib.no">
Corpora@uib.no</a><br>> <a href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a><br>><br><br><br>--<br>K. B. Cohen<br>Biomedical Text Mining Group Lead<br>Center for Computational Pharmacology
<br>303-724-7563 (office) 303-916-2417 (cell) 303-377-9194 (home)<br><a href="http://compbio.uchsc.edu/Hunter_lab/Cohen">http://compbio.uchsc.edu/Hunter_lab/Cohen</a><br><br>_______________________________________________
<br>Corpora mailing list<br><a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br><a href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a><br></blockquote></div><br><br clear="all"><br>--
<br>Go placidly amid the noise and haste, and remember<br>what peace there may be in silence. As far as possible,<br>without surrender, be on good terms with all persons.<br>— Desiderata, MAX EHRMANN, 1927