[Corpora-List] number and dates normalization

Gregory Marton gremio at csail.mit.edu
Wed Jul 30 21:08:21 UTC 2008


Hi Shachar,

Also of note for dates:

datejs: http://www.datejs.com/
"An open-source Java-Script Date Library"

joda: http://joda-time.sourceforge.net/
"Joda-Time provides a quality replacement for the Java date and time 
classes. The design allows for multiple calendar systems, while still 
providing a simple API. The 'default' calendar is the ISO8601 standard 
which is used by XML. The Gregorian, Julian, Buddhist, Coptic, Ethiopic and 
Islamic systems are also included, and we welcome further additions. 
Supporting classes include time zone, duration, format and parsing."



It's a little awkward, but for "fifteen hundred" and the like, you can use 
a service of ours.  open a tcp connection as shown below, type "fifteen 
hundred" and hit return, and you'll get a data structure you can read:

$ telnet malta.csail.mit.edu 8009
Trying 128.30.44.123...
Connected to malta.csail.mit.edu.
Escape character is '^]'.
fifteen hundred
((number "fifteen hundred" :span (0 15) :value 1500 :notation natural))
Connection closed by foreign host.

If you do end up using that final interface, please let me know so I can 
make it robustly available on a more sensible name like 
numeric-normalizer.csail.mit.edu or such.  The particular server is 
otherwise liable to change.  I'm also happy to make its source available.

Best,
Grem


> Hi,
>
>
>
> Here's a summary of the pointers we got for the number and date
> normalization inquiry:
>
>
>
>
>
> - ICU4J (http://icu-project.org/index.html) - a set of libraries for
> globalization purposes, including number and date formatting.
>
>
>
> -  A date normalizer by Mark Greenwood found at:
> http://www.dcs.shef.ac.uk/~mark/dev/java/index.html
>
>
>
> - Unix date program, part of GNU coreutils:
> http://www.gnu.org/software/coreutils/
>
>
>
> - hCalendar ( <http://microformats.org/wiki/hcalendar>
> http://microformats.org/wiki/hcalendar , a microformats standard for
> calendaring and events format.
>
>
>
> - TempEx: for date and time expression tagging by George Wilson:
> http://timex2.mitre.org/cgi-bin/download?file=TempEx_R1_05_03.tar
>
>
>
>
>
> Thanks to Trevor Jenkins, Michael Hawkes, Mark Greenwood and George Wilson
> for their help.
>
>
>
>
>
> Shachar
>
>
>
>  _____
>
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Shachar Mirkin
> Sent: Thursday, July 24, 2008 8:25 PM
> To: corpora at uib.no
> Subject: [Corpora-List] number and dates normalization
>
>
>
> Hi,
>
>
>
> I'm looking for an available package (preferably Java) for numbers and dates
> normalization, that given "fifteen hundred" will return "1500" and given
> "January, 23 1987" will return a date in some predefined schema, e.g.
> "23/1/87".
>
>
>
> Anyone knows of such a tool?
>
>
>
> Thanks,
>
>
>
> Shachar Mirkin
>
> Bar-Ilan University, Israel
>
>

-- 
------ __@   Gregory A. Marton                http://csail.mit.edu/~gremio/
--- _`\<,_                                                                .
-- (*)/ (*)    #The mouse the cat the dog the man kicked chased bit died.
~~~~~~~~~~~~~~~~-~~~~~~~~_~~~_~~~~~v~~~~^^^^~~~~~--~~~~~~~~~~~~~~~++~~~~~~~


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list