[Corpora-List] Genre Specific Corpus Studies

Romain Loth rloth at u-paris10.fr
Thu Jun 30 13:50:29 UTC 2011


Dear Muhammad and colleagues,

I don't have much material on print press/media genres but i gathered a 
lot of background information on textual genres in general... especially 
with regards of their link with languages for specific purposes, 
domain-specific terminologies and text structures, 3 areas that are 
crucial for modern text mining.

So if you're interested in that angle, then I would suggest this medley 
of various articles I liked in addition to Linda's classical references :

(About some particular genres)
-> Harris, Z.S. et al., 1989. The form of information in science: 
analysis of an immunology sublanguage, Kluwer Academic Publishers.
-> Farzindar, A. & Lapalme, G., 2004. Legal text summarization by 
exploration of the thematic structures and argumentative roles. In Text 
Summarization Branches Out Workshop held in conjunction with ACL. pp. 27–34

(About features to consider for syntactic analysis, terminology 
extraction and ontologies)
-> Aubin, S., Nazarenko, A. & Nédellec, C., 2006. Adapting a general 
parser to a sublanguage. Arxiv preprint cs/0606118. Available at: 
http://arxiv.org/pdf/cs/0606118
-> Condamines, A., 2008. Taking genre into account when analysing 
conceptual relation patterns. Corpora, 3(2), pp.115–140. Available at: 
http://w3.erss.univ-tlse2.fr/textes/pagespersos/acondami/Corpora.pdf
-> Péry-Woodley, M.P. & Rebeyrolle, J., 1998. Domain and genre in 
sublanguage text: definitional microtexts in three corpora. In 
Proceedings of the First International Conference on Language Resources 
and Evaluation (LREC-1998). pp. 987–992.
-> Solskinnsbakk, G. & Gulla, J.A., 2008. Ontological Profiles as 
Semantic Domain Representations. In Natural language and information 
systems: 13th International Conference on Applications of Natural 
Language to Information Systems, NLDB 2008, London, UK, June 24-27, 
2008: proceedings. p. 67. Available at: 
http://www.idi.ntnu.no/~geirsols/docs/NLDB_08.pdf.

(About genre-specific corpora on the web)
-> Baroni, M. et al., 2006. WebBootCaT: instant domain-specific corpora 
to support human translators. In Proceedings of EAMT. pp. 247–252.
-> Meyer zu Eissen, S. & Stein, B., 2004. Genre Classification of Web 
Pages. In KI 2004: Advances in Artificial Intelligence. pp. 256-269. 
Available at: http://www.springerlink.com/content/t50xxamdee88gcc7 
[Accessed November 10, 2009].

(General background on genres and sublanguages)

-> Biber, D., 1992. The multi-dimensional approach to linguistic 
analyses of genre variation: An overview of methodology and findings. 
Computers and the Humanities, 26(5), pp.331–345
-> Bowker, L. & Pearson, J., 2002. Working with specialized language: a 
practical guide to using corpora, Psychology Press.
-> Foucault, M., 1969. L’archéologie du savoir, éditions Gallimard. 
Available at: 
http://www.scribd.com/doc/2465728/Foucault-Michel-Larcheologie-Du-Savoir.
-> Grishman, R., 2001. Adaptive information extraction and sublanguage 
analysis. In Proc. of IJCAI 2001. Available at: 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.9903&rep=rep1&type=pdf
-> Kittredge, R. & Lehrberger, J., 1982. Sublanguage: Studies of 
language in restricted semantic domains, Walter de Gruyter.
-> Lee, D., 2001. Genres, registers, text types, domains, and styles: 
Clarifying the concepts and navigating a path through the BNC jungle. 
Language Learning & Technology, 5(3), pp.37–72.
-> Hjørland, B., 2002. Domain analysis in information science: eleven 
approaches–traditional as well as innovative. Journal of documentation, 
58(4), pp.422–462.
-> Sager, N., 1986. Sublanguage: Linguistic phenomenon, computational 
tool. Analyzing language in restricted domains: sublanguage description 
and processing, pp.1–18.


By the way, I feel very strongly that semantic web research has often 
overlooked the genre & text level to focus directly on meaning 
formalization. But domain semantics and textual genres have a 
deep-running relationship, which anyone can see by making a list of the 
types of corpora that classically motivated information extraction and 
ontology research :
- medical reports
- news articles
- military wires
- business reports
- scientific papers
- technical manuals
- job ads
- legal texts
- etc.

These are all genres that define their own sublanguages. IMHO they 
should be considered as a pre-made formalization of the meaning. 
Nowadays information extraction research takes a renewed interest in 
this with the problematic of semi-structured data.

Anyway... I hope that helps !

-- 
Romain Loth
Ingénieur d'études
MoDyCO, UMR 7114 - CNRS
Université Paris 10 Nanterre
tél : 01 40 97 74 31



__________________________________________________________________


 > From: Linda Bawcom <linda.bawcom at sbcglobal.net>
 > To: True Friend <true.friend2004 at gmail.com>, corpora <corpora at uib.no>
 > Sent: Wed, 29 Jun 2011 08:44:48 -0700 (PDT)
 > Subject: Re: [Corpora-List] Genre Specific Corpus Studies

 > Dear Muhammad,

 > You are probably already familar with the following books (the
 > bibliographies of which are also helpful), but thought I'd pass the
 > info along just in case:

 > Bell, Alan (1991). The Language of News Media. Blackwell.
 > Biber, Douglas. & Conrad, Susan. (2009). Register, Genre, and Style.
 > CUP
 > Reah, Danuta. (2002). The Language of Newspapers (2nd ed.). Routledge

 > Kindest regards,
 > Linda
__________________________________________________________________

 >> From: True Friend <true.friend2004 at gmail.com>
 >> To: corpora <corpora at uib.no>
 >> Sent: Wed, June 29, 2011 12:20:10 AM
 >> Subject: [Corpora-List] Genre Specific Corpus Studies
 >> Dear Corpora Members
 >> I am looking for some studies which highlight genre features, specially
 >> that of Print Media related genres e.g. News Reports, Columns,
 >> Editorials etc.
 >> I would really appreciate if you can provide me with any such clue.
 >> Regards
 >> --
 >> Muhammad Shakir Aziz ÙØÙد شاکر عزÛز
 >> Masters in Applied Linguistics
 >> Translator, Course Developer, Linguist for Urdu, Punjabi and English
 >> Urdu:- http://awaz-e-dost.blogspot.com/
 >> English:- http://linguisticslearner.blogspot.com/
 >> Facebook:- http://www.facebook.com/truefriend2004
 >> Skype:- true_friend2004




_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list