[Corpora-List] Neologisms

Damien Nouvel damien at nouvels.net
Tue Jan 14 11:50:32 UTC 2014


Dear karoline,

I participated in a project conducted about neologism detection and
automatic enrichment of lexicon... for French:
https://sites.google.com/site/projetedylex/publications

Among other, we were able to show that, using diverse approaches, we were
able to automatically find neologisms in a newsfeed and assign inflectional
categories to them.
Since latest work have only been published in French yet (English abstract
below), don't hesitate to get in touch with us for any feedback about this.

Best,
Damien

--

Lexical incompleteness is a recurring problem when dealing with natural
language and its variability. It seems indeed necessary today to regularly
validate and extend lexica used by tools processing large amounts of
textual data. This is even more true when processing real-time text flows.
In this context, our paper introduces techniques aimed at addressing words
unknown to a lexicon. We first study neology (from a theoretic and
corpus-based point of view) and describe the modules we have developed for
detecting them and inferring information about them (lemma, category,
inflectional class). We show that we are able, using among others modules
for analyzing derived and compound neologisms, to generate lexical entries
candidates in real-time and with a good precision.



2014/1/10 Mark Davies <Mark_Davies at byu.edu>

>  (Sorry for the delay in responding)
>
>  In order to look for neologisms, you'll need a monitor corpus that
> continues to be added to every year or two, and (crucially) which has
> roughly the same composition from year to year. As far as I'm aware, the
> only publicly-accessible monitor corpus with these specifications in the
> Corpus of Contemporary American English (COCA):
> http://corpus.byu.edu/coca/ .
>
>  (See http://llc.oxfordjournals.org/content/25/4/447.abstract for a
> comparison of COCA, the Bank of English, and the Oxford English Corpus as
> monitor corpora.)
>
>  The hard part is having the corpus interface automatically find
> neologisms for you. In COCA you can have it show you, for example, all
> adjectives that occur in 2012, but not in 1990-2011. But because the CLAWS7
> tagger isn't perfect, you'll have to wade through lots of spurious data to
> find the neologisms.
>
> But if you already have words or phrases in mind, then COCA can map out
> their frequency year by year since 1990 quite well, e.g.:
>
>  morph: http://corpus.byu.edu/coca/?h=y&c=coca&q=105
> old-school: http://corpus.byu.edu/coca/?h=y&c=coca&q=106
> gift (as verb): http://corpus.byu.edu/coca/?h=y&c=coca&q=124
> think outside the box: http://corpus.byu.edu/coca/?h=y&c=coca&q=155
> throw someone under the bus: http://corpus.byu.edu/coca/?c=coca&q=15643189
>
>  There are more examples at http://corpus.byu.edu/coca/x.asp?f=changes_e
>
>  Best,
>
>  Mark Davies
>
>  ============================================
> Mark Davies
> Professor of Linguistics / Brigham Young University
> http://davies-linguistics.byu.edu/
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
>    ------------------------------
> *From:* corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of
> kazavora at students.unibe.ch [kazavora at students.unibe.ch]
> *Sent:* Monday, January 06, 2014 7:52 AM
> *To:* corpora at uib.no
> *Subject:* [Corpora-List] Neologisms
>
>   Dear all,
>
> I am doing a corpus about neologism, looking at new words that evolved in
> the last couple of years and the word-formation process they went throught.
> Therefore I need a source where I can find all the new words that evolved
> in the last couple of years or the last decade. Do you have any helpful
> links, etc.
>
> Thank you very much.
>
> Best wishes,
>
> Karoline Zavora
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
damien at nouvels.net
GSM: +33 (0) 6 63 56 27 17
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140114/f008cef0/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list