28.2649, Qs: Detect Titles in Corpus
The LINGUIST List
linguist at listserv.linguistlist.org
Wed Jun 14 20:16:42 UTC 2017
LINGUIST List: Vol-28-2649. Wed Jun 14 2017. ISSN: 1069 - 4875.
Subject: 28.2649, Qs: Detect Titles in Corpus
Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
Michael Czerniakowski)
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================
Date: Wed, 14 Jun 2017 16:16:34
From: Pilar Santolaria [mpilar.santolaria at gmail.com]
Subject: Detect Titles in Corpus
I'm trying to detect titles of books, computer games and movies in my Spanish
corpus. Grammar tags don't help because the corpus is not grammatically
consistent. I'm trying to find a database with all the published books,
movies, etc. to train a classifier because I haven't found a better solution.
However, I've thought that for sure more linguists have been in the same
situation since titles are very misleading (contain words from many domains
and do not necessarily follow an exclusive grammatical pattern).
Does anyone have a better solution for this problem? Or, at least, if you do
know a good database for titles... I would very much appreciate any kind of
help!
Thank you very much for your time and attention,
Linguistic Field(s): Text/Corpus Linguistics
Subject Language(s): Spanish (spa)
------------------------------------------------------------------------------
***************** LINGUIST List Support *****************
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-28-2649
----------------------------------------------------------
More information about the LINGUIST
mailing list