[Corpora-List] number and dates normalization
Gregory Marton
gremio at csail.mit.edu
Wed Jul 30 21:08:21 UTC 2008
Hi Shachar,
Also of note for dates:
datejs: http://www.datejs.com/
"An open-source Java-Script Date Library"
joda: http://joda-time.sourceforge.net/
"Joda-Time provides a quality replacement for the Java date and time
classes. The design allows for multiple calendar systems, while still
providing a simple API. The 'default' calendar is the ISO8601 standard
which is used by XML. The Gregorian, Julian, Buddhist, Coptic, Ethiopic and
Islamic systems are also included, and we welcome further additions.
Supporting classes include time zone, duration, format and parsing."
It's a little awkward, but for "fifteen hundred" and the like, you can use
a service of ours. open a tcp connection as shown below, type "fifteen
hundred" and hit return, and you'll get a data structure you can read:
$ telnet malta.csail.mit.edu 8009
Trying 128.30.44.123...
Connected to malta.csail.mit.edu.
Escape character is '^]'.
fifteen hundred
((number "fifteen hundred" :span (0 15) :value 1500 :notation natural))
Connection closed by foreign host.
If you do end up using that final interface, please let me know so I can
make it robustly available on a more sensible name like
numeric-normalizer.csail.mit.edu or such. The particular server is
otherwise liable to change. I'm also happy to make its source available.
Best,
Grem
> Hi,
>
>
>
> Here's a summary of the pointers we got for the number and date
> normalization inquiry:
>
>
>
>
>
> - ICU4J (http://icu-project.org/index.html) - a set of libraries for
> globalization purposes, including number and date formatting.
>
>
>
> - A date normalizer by Mark Greenwood found at:
> http://www.dcs.shef.ac.uk/~mark/dev/java/index.html
>
>
>
> - Unix date program, part of GNU coreutils:
> http://www.gnu.org/software/coreutils/
>
>
>
> - hCalendar ( <http://microformats.org/wiki/hcalendar>
> http://microformats.org/wiki/hcalendar , a microformats standard for
> calendaring and events format.
>
>
>
> - TempEx: for date and time expression tagging by George Wilson:
> http://timex2.mitre.org/cgi-bin/download?file=TempEx_R1_05_03.tar
>
>
>
>
>
> Thanks to Trevor Jenkins, Michael Hawkes, Mark Greenwood and George Wilson
> for their help.
>
>
>
>
>
> Shachar
>
>
>
> _____
>
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Shachar Mirkin
> Sent: Thursday, July 24, 2008 8:25 PM
> To: corpora at uib.no
> Subject: [Corpora-List] number and dates normalization
>
>
>
> Hi,
>
>
>
> I'm looking for an available package (preferably Java) for numbers and dates
> normalization, that given "fifteen hundred" will return "1500" and given
> "January, 23 1987" will return a date in some predefined schema, e.g.
> "23/1/87".
>
>
>
> Anyone knows of such a tool?
>
>
>
> Thanks,
>
>
>
> Shachar Mirkin
>
> Bar-Ilan University, Israel
>
>
--
------ __@ Gregory A. Marton http://csail.mit.edu/~gremio/
--- _`\<,_ .
-- (*)/ (*) #The mouse the cat the dog the man kicked chased bit died.
~~~~~~~~~~~~~~~~-~~~~~~~~_~~~_~~~~~v~~~~^^^^~~~~~--~~~~~~~~~~~~~~~++~~~~~~~
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list