[Corpora-List] mailing address parser

Saurav Sahay sauravsahay at gmail.com
Mon Oct 15 18:59:51 UTC 2007


A good starting pointer for writing such a parser quickly can be UIMA (IBM's
Unstructured Information Management Architecture) example code (Java) on
recognizing dates, time, room numbers in many different formats. Based on
regular expressions and patterns.

Thanks,

Saurav Sahay

PhD Student,
College of Computing,
Georgia Tech
www.cc.gatech.edu/~ssahay

On 10/15/07, Kevin B. Cohen <kevin.cohen at gmail.com> wrote:
>
> Nate,
>
> Speaking as someone who wrote one of these at MapQuest, I can tell you
> that it's entirely possible to do.
>
> NDAs limit what I can tell you about the specific approach, but
> without releasing any industrial secrets I can certainly tell you that
> if you're not offended by writing lots of special-case rules, it's not
> difficult to do.  It will take you longer to put together a thorough
> set of test cases than it will to write the code, but once you've put
> together a good set of test cases, it will probably be pretty obvious
> to you what your code needs to do.
>
> Kev
>
> On 10/15/07, Nate Blaylock <nblaylock at ihmc.us> wrote:
> > Hi all,
> >
> > I am looking for a free (as in gratis) free-form mailing address parser
> > -- i.e., something which will take a free-form address like:
> >    123 Maple St. #1025
> >    Springfield, IL  12345
> >
> > and return the constituent parts:
> >   Street: Maple St.; House number: 123; Apartment number: 1025 City:
> > Springfield; State: IL; Zipcode: 12345
> >
> > Something that worked for any international address would be great, but
> > I am most interested in coverage of US addresses.
> >
> > thanks,
> >
> > nate
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
>
>
> --
> K. B. Cohen
> Biomedical Text Mining Group Lead
> Center for Computational Pharmacology
> 303-724-7563 (office) 303-916-2417 (cell) 303-377-9194 (home)
> http://compbio.uchsc.edu/Hunter_lab/Cohen
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
Go placidly amid the noise and haste, and remember
what peace there may be in silence. As far as possible,
without surrender, be on good terms with all persons.
— Desiderata, MAX EHRMANN, 1927
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20071015/61275a03/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list