[Corpora-List] Text Mining for Trend Analysis

王經篤 wangjingdoo at gmail.com
Mon Sep 20 03:14:44 UTC 2010


Dear all,

Sorry to bother you again.

I need some corpus with timestamp such that I can compute the "pattern
history" from these corpus. I was blocked with the copywrite issue.
Therefore, I try to seek for cooperation by providing web service of pattern
history if someone could offer his/her corpus.

There is a web site for pattern history and
its URL is (http://120.108.115.115/TM/Search_PubMed_Simple.php).
The pattern history extracted from  medicine articles
"PubMed"( from 1990 to 2009),
containing 3,225,549 articles containing 677,728,269 words (600M+
MILLION WORDS) .

Please don't hesitate to let me know if you are willing to have frequency
distribution of pattern over time from a large and long time periods for
trend analysis.

Jing-Doo Wang

Assistant Professer
Department of Computer Science and Information Engineering
Asia Universiyt, Taiwan.

886-4-23323456-ext 1847
http://asia.edu.tw/~jdwang <http://asia.edu.tw/%7Ejdwang>
jdwang at asia.edu.tw
wangjingdoo at gmail.com

2010/9/17 王經篤 <wangjingdoo at gmail.com>

> Dear all,
>
> I am  focusing on the extraction of maximal repeat patterns
> from textual information, meanwhile compute the frequency distribution of
> these patterns over time(pattern history).
>
>
> There is a web site for pattern history and
> its URL is (http://120.108.115.115/TM/Search_PubMed_Simple.php).
> The pattern history extracted from  medicine articles
> "PubMed"( from 1990 to 2009),
> containing 3,225,549 articles containing 677,728,269 words (600M+
> MILLION WORDS) .
> Note that the type of these patterns extracted not only
> include single-word but also phrases (multi-words),
> e.g. "patients with squamous cell carcinoma of the head and neck".
> To more specific, any segment (a sequence of words) within sentences
> in corpus will be extracted if that segment appear twice;
> meanwhile the corresponding frequency distribution of that segment
> over time, defined as "pattern history",  would be computed.
>
> I am looking forward to have more retrospective
> (historial)(chronological) corpus, publications or literatures for
> experiements to make my  experiments more robust, and seek for linguistic
> experts
> for cooperation  if they could provide the text with timestamp.
>
> I will also provide them with the patterns histories extracted from these
> corpus as the feedback.
> please let me know if you have textual data(Corpus) with timestamp
>
> Yours faithfully,
>
> ps. There is an abstract about what I am doing as attached.
>
> --
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100920/01b920c0/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PatternHistoryExtraction_Abstract.pdf
Type: application/pdf
Size: 23223 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100920/01b920c0/attachment-0001.pdf>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list