[Corpora-List] mailing address parser

Tony Abou-Assaleh taa at acm.org
Mon Oct 15 19:08:36 UTC 2007


Another approach is to take one of the free taggers (e.g., HMM part of
speech tagger) and train it to recognize states representing address
elements. You would need a lot of training data though.

Cheers,

TAA

-----------------------------------------------------
Tony Abou-Assaleh
Email:    taa at acm.org
Web site: http://tony.abou-assaleh.net
----------------------[THE END]----------------------

On Mon, 15 Oct 2007, Saurav Sahay wrote:

> A good starting pointer for writing such a parser quickly can be UIMA (IBM's
> Unstructured Information Management Architecture) example code (Java) on
> recognizing dates, time, room numbers in many different formats. Based on
> regular expressions and patterns.
>
> Thanks,
>
> Saurav Sahay
>
> PhD Student,
> College of Computing,
> Georgia Tech
> www.cc.gatech.edu/~ssahay
>
> On 10/15/07, Kevin B. Cohen <kevin.cohen at gmail.com> wrote:
> >
> > Nate,
> >
> > Speaking as someone who wrote one of these at MapQuest, I can tell you
> > that it's entirely possible to do.
> >
> > NDAs limit what I can tell you about the specific approach, but
> > without releasing any industrial secrets I can certainly tell you that
> > if you're not offended by writing lots of special-case rules, it's not
> > difficult to do.  It will take you longer to put together a thorough
> > set of test cases than it will to write the code, but once you've put
> > together a good set of test cases, it will probably be pretty obvious
> > to you what your code needs to do.
> >
> > Kev
> >
> > On 10/15/07, Nate Blaylock <nblaylock at ihmc.us> wrote:
> > > Hi all,
> > >
> > > I am looking for a free (as in gratis) free-form mailing address parser
> > > -- i.e., something which will take a free-form address like:
> > >    123 Maple St. #1025
> > >    Springfield, IL  12345
> > >
> > > and return the constituent parts:
> > >   Street: Maple St.; House number: 123; Apartment number: 1025 City:
> > > Springfield; State: IL; Zipcode: 12345
> > >
> > > Something that worked for any international address would be great, but
> > > I am most interested in coverage of US addresses.
> > >
> > > thanks,
> > >
> > > nate
> > >
> > > _______________________________________________
> > > Corpora mailing list
> > > Corpora at uib.no
> > > http://mailman.uib.no/listinfo/corpora
> > >
> >
> >
> > --
> > K. B. Cohen
> > Biomedical Text Mining Group Lead
> > Center for Computational Pharmacology
> > 303-724-7563 (office) 303-916-2417 (cell) 303-377-9194 (home)
> > http://compbio.uchsc.edu/Hunter_lab/Cohen
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
>
>
>
> --
> Go placidly amid the noise and haste, and remember
> what peace there may be in silence. As far as possible,
> without surrender, be on good terms with all persons.
> — Desiderata, MAX EHRMANN, 1927
>

-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list