[Corpora-List] Do we still need language corpora?
Alberto Simões
albie at alfarrabio.di.uminho.pt
Fri Feb 4 16:36:32 UTC 2011
yes, we need!
On 04/02/2011 14:53, Janne Bondi Johannessen wrote:
> Dear all.
> I have been responsible for developing many corpora at the University of
> Oslo, and I can safely say that there is hardly any of them that have
> any competition from the web. Leaving aside the question of user
> interface, there are many features that are not present in web
> documents, and that are important for users (linguists, text researchers
> and language technologists). Here are some:
>
> - spoken language
> - dialects
> - speech situations
> - dialogue
> - source and translated texts
> - free choice of text types and genres
> - grammatical annotation (and other linguistic annotation)
> - background information on the text producers (age, gender, mother
> tongue, place of birth, place of living, education etc.)
>
> For more information on our corpora, see:
> http://www.hf.uio.no/iln/english/about/organization/text-laboratory/
>
> Best wishes,
> Janne Bondi Johannessen
>
>
>
> 2011/2/4 Mark Davies <Mark_Davies at byu.edu <mailto:Mark_Davies at byu.edu>>
>
> Martin,
>
> I would imagine that one motivation for the question is the
> availability of "corpora" like Google/Web and Google Books. Of
> course, one needs to distinguish between:
>
> corpus = textual corpus (i.e. words and sentences + metadata)
> and
> corpus = textual corpus + architecture and interface for accessing
> the information
>
> Many wonderful textual corpora are "trapped" inside an architecture
> and interface that don't allow users to do much with them. As
> everyone dealing with "Web as Corpus" knows, effectively and
> efficiently using Web/Google/Books data -- especially via the native
> Google interface -- is a real challenge.
>
> Two pages that might be relevant:
>
> http://corpus.byu.edu/coha/compare-googleBooks.asp
>
> http://corpus.byu.edu/coca/compare-google.asp
>
> Best,
>
> Mark D.
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> Web: http://davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
>
>
>
> --
> Janne Bondi Johannessen
> Professor, The Text Laboratory, ILN, http://www.hf.uio.no/tekstlab/
> President, NEALT, http://omilia.uio.no/nealt/
> University of Oslo
> P.O.Box 1102 Blindern, N-0317 Oslo, Norway
> Tel: +47 22 85 68 14, mob.: +47 928 966 34
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
Alberto Simões
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list