<br><font size=2 face="sans-serif">You could try the Corpus of Modern Scottish

Writing (1700-1945) which has a range of text types going back to the 18th

century. At the moment the texts can only be downloaded one by one - so

you could work on a subcorpus to start with - but a bulk download should

be made available in the not too distant future. See </font><a href=http://www.scottishcorpus.ac.uk/cmsw/><font size=2 face="sans-serif">http://www.scottishcorpus.ac.uk/cmsw/</font></a><font size=2 face="sans-serif">

You can view digital facsimiles, transcriptions and plain text and also

download plain text files.</font>

<br>

<br><img src=cid:_1_07B835F807B830340038271548257869>

<br>

<br><font size=2 face="sans-serif">Hope this helps,</font>

<br>

<br><font size=2 face="sans-serif">John Corbett</font>

<br>

<br>

<br>

<br>

<br>

<table width=100%>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">From:</font>

<td><font size=1 face="sans-serif">corpora-request@uib.no</font>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">To:</font>

<td><font size=1 face="sans-serif">corpora@uib.no</font>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">Date:</font>

<td><font size=1 face="sans-serif">05/04/2011 18:02</font>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">Subject:</font>

<td><font size=1 face="sans-serif">Corpora Digest, Vol 46, Issue 6</font>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">Sent by:</font>

<td><font size=1 face="sans-serif">corpora-bounces@uib.no</font></table>

<br>

<hr noshade>

<br>

<br>

<br><tt><font size=2>Today's Topics:<br>

<br>

   1.  corpus of plain text docs in English (petar@lml.bas.bg)<br>

   2. Re:  corpus of plain text docs in English (Mark Davies)<br>

   3.  Call for Papers: "Language Technology for a  

              Multilingual<br>

      Europe" (David Vilar)<br>

   4.  CFP SIGIR 2011 Workshop on "entertain me": Supporting<br>

      Complex Search Tasks (Jaap Kamps)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Fri, 1 Apr 2011 10:13:28 +0300<br>

From: petar@lml.bas.bg<br>

Subject: [Corpora-List] corpus of plain text docs in English<br>

To: Corpora@uib.no<br>

<br>

Dear Corpora members,<br>

<br>

I am working on a domain specific machine translation project. I am<br>

looking for a corpus of plain text (historical) documents in English. I<br>

would like to experiment whether standard n-gram model, trained on such<br>

docs, could be used to improve other machine translation techniques<br>

designed specially for historical docs. Would you recommend some corpora?<br>

<br>

Thank you.<br>

<br>

Best regards,<br>

Petar Mitankin<br>

<br>

<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Mon, 4 Apr 2011 08:43:17 -0600<br>

From: Mark Davies <Mark_Davies@byu.edu><br>

Subject: Re: [Corpora-List] corpus of plain text docs in English<br>

To: "petar@lml.bas.bg" <petar@lml.bas.bg>, "Corpora@uib.no"<br>

<Corpora@uib.no><br>

<br>

Petar,<br>

<br>

I'm not sure how far back you want the texts. If it's just to the early

1800s or so, you might check the links at the 400 million word Corpus of

Historical American English (</font></tt><a href=http://corpus.byu.edu/coha><tt><font size=2>http://corpus.byu.edu/coha</font></tt></a><tt><font size=2>):

Help / Composition of Corpus. It provides suggestions for some nice text

archives, like Project Gutenberg, Making of America, etc.<br>

<br>

For anything farther back than the early 1800s, you could just use the

older texts from Project Gutenberg, or the many online archives of authors

of Early Modern English. If your library is a member, you'll also want

to check the huge collection at Early English Books Online (EEBO) for the

machine readable (as opposed to the PDF image) texts.<br>

<br>

Best,<br>

<br>

Mark Davies<br>

<br>

============================================<br>

Mark Davies<br>

Professor of (Corpus) Linguistics<br>

Brigham Young University<br>

(phone) 801-422-9168 / (fax) 801-422-0906<br>

<br>

</font></tt><a href="http://davies-linguistics.byu.edu/"><tt><font size=2>http://davies-linguistics.byu.edu</font></tt></a><tt><font size=2><br>

<br>

** Corpus design and use // Linguistic databases **<br>

** Historical linguistics // Language variation **<br>

** English, Spanish, and Portuguese **<br>

============================================ <br>

<br>

<br>

> -----Original Message-----<br>

> From: corpora-bounces@uib.no [</font></tt><a href="mailto:corpora-bounces@uib.no"><tt><font size=2>mailto:corpora-bounces@uib.no</font></tt></a><tt><font size=2>]

On Behalf Of<br>

> petar@lml.bas.bg<br>

> Sent: Friday, April 01, 2011 1:13 AM<br>

> To: Corpora@uib.no<br>

> Subject: [Corpora-List] corpus of plain text docs in English<br>

> <br>

> Dear Corpora members,<br>

> <br>

> I am working on a domain specific machine translation project. I am

looking for a<br>

> corpus of plain text (historical) documents in English. I would like

to experiment<br>

> whether standard n-gram model, trained on such docs, could be used

to improve<br>

> other machine translation techniques designed specially for historical

docs. Would you<br>

> recommend some corpora?<br>

> <br>

> Thank you.<br>

> <br>

> Best regards,<br>

> Petar Mitankin<br>

> <br>

> <br>

> <br>

> _______________________________________________<br>

> UNSUBSCRIBE from this page: </font></tt><a href=http://mailman.uib.no/options/corpora><tt><font size=2>http://mailman.uib.no/options/corpora</font></tt></a><tt><font size=2><br>

> Corpora mailing list<br>

> Corpora@uib.no<br>

> </font></tt><a href=http://mailman.uib.no/listinfo/corpora><tt><font size=2>http://mailman.uib.no/listinfo/corpora</font></tt></a><tt><font size=2><br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Tue, 05 Apr 2011 10:52:13 +0200<br>

From: David Vilar <david.vilar@dfki.de><br>

Subject: [Corpora-List] Call for Papers: "Language Technology for

a<br>

Multilingual Europe"<br>

To: CORPORA@UIB.NO<br>

<br>

PDF Version with complete information:<br>

</font></tt><a href="http://www.dfki.de/~davi01/cfp/ws-cfp.en.pdf"><tt><font size=2>http://www.dfki.de/~davi01/cfp/ws-cfp.en.pdf</font></tt></a><tt><font size=2><br>

<br>

Apologies if you receive multiple copies of this call.<br>

<br>

Call for Papers: "Language Technology for a Multilingual Europe"<br>

================================================================<br>

<br>

Overview<br>

--------<br>

<br>

The Workshop aims at bringing various groups together who are concerned<br>

with the broad topic of "Language Technology for a multilingual Europe".<br>

This encompasses on the one hand representatives from research and<br>

development in the field of language technologies, on the other hand<br>

users from quite divers areas. Two examples of the application of<br>

language technology is (automatic / machine) translation, and processing<br>

of texts from the humanities with methods from language technology, like<br>

automatic topic indexing, text mining, integrating numerous texts and<br>

additional information across languages etc.<br>

<br>

These kinds of application areas and research and development in<br>

language technology have in common that they rely on resources (lexica,<br>

corpora, grammars, ontologies etc.), or that they produce these<br>

resources. A multilingual Europe, being supported by language<br>

technology, is only possible if an adequate, interoperable<br>

infrastructure of resources, including the related tooling, is available<br>

for all European languages.<br>

<br>

In addition it is necessary that the aforementioned and other<br>

communities of developers and users of language technology stand as one,<br>

homogenous community.  Only in this way it will be possible to assure<br>

the long-term political acceptance of the topic "language technology"

in<br>

Europe.<br>

<br>

Topics<br>

------<br>

<br>

The workshop aims at brining research and development from academia and<br>

industry together, to discuss the aforementioned technical and political<br>

prerequisites for language technology in Europe. Submissions may touch<br>

on the following or other aspects of this overall topic:<br>

<br>

- Research and development of language technology in various areas<br>

   (Human Language Technology, ICT, eHumanities, ...)<br>

- Infrastructure for resources in language technology<br>

- Prerequisites for interoperability of language technology based<br>

   applications<br>

- Language technology and standardization<br>

- "Political perspectives" about requirements and the usefulness

of<br>

   language technology, from the perspective of research, industry

and<br>

   various user communities.<br>

<br>

Important dates<br>

---------------<br>

<br>

Deadline for submission of abstracts: May 15th 2011<br>

Notification of acceptance: June 15th 2011<br>

Workshop: September 27th, the Tuesday before the GSCL conference<br>

<br>

-- <br>

David Vilar Torres<br>

DFKI GmbH, Alt-Moabit 91c, 10559 Berlin<br>

Tel. (+49) 30 238 95 1845<br>

<br>

--------------- Legal Note ---------------<br>

Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH<br>

Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern<br>

Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster <br>

(Vorsitzender), Dr. Walter Olthoff<br>

Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes<br>

Amtsgericht Kaiserslautern, HRB 2313<br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Tue, 05 Apr 2011 11:11:26 +0200<br>

From: Jaap Kamps <kamps@science.uva.nl><br>

Subject: [Corpora-List] CFP SIGIR 2011 Workshop on "entertain me":<br>

Supporting Complex Search Tasks<br>

To: corpora@uib.no<br>

<br>

SIGIR 2011 Workshop on "entertain me": Supporting Complex Search

Tasks<br>

July 28, Beijing<br>

</font></tt><a href=http://staff.science.uva.nl/~kamps/entertainme/><tt><font size=2>http://staff.science.uva.nl/~kamps/entertainme/</font></tt></a><tt><font size=2><br>

<br>

Call for Papers: deadline June 3<br>

<br>

<br>

* A Workshop on a Single Query ?!?<br>

<br>

Searchers with a complex information need typically slice-and-dice their

<br>

problem into several queries and subqueries, and laboriously combine the

<br>

answers post hoc to solve their tasks.  This workshop invites discussion

<br>

about any technique, knowledge representation, model or technology to <br>

integrate the search results into a coherent session on a level of <br>

abstraction which matches the original information need.<br>

<br>

Consider planning a social event at the last day of SIGIR, in the <br>

unknown city of Beijing, factoring in distances, timing, and preferences

<br>

on budget, cuisine, and entertainment.  A system supporting the entire

<br>

search episode should "know" a lot, either from profiles or implicit

<br>

information, or from explicit information in the query or from feedback.

<br>

  This may lead to the (interactive) construction of a complexly <br>

structured query, but sometimes the most obvious query for a complex <br>

need is dead simple: "entertain me."  Rather than returning

<br>

ten-blue-lines in response to a 2.4-word query, the desired system <br>

should support searchers during their whole task or search episode, by

<br>

iteratively constructing a complex query or search strategy, by <br>

exploring the result-space at every stage, and by combining the partial

<br>

answers into a coherent whole.<br>

<br>

Although a SIGIR Workshop devoted to a single query may seem <br>

extravagant, this query is just one example of the general problem of <br>

supporting simple and common requests that express complex and dynamic

<br>

needs.<br>

<br>

* Social Evening Program<br>

<br>

Many interesting ideas will come out of the workshop, but how do we know

<br>

if they are any good?  We will have a special breakout group designing

a <br>

mock-up for solving the "entertain me" query, charting out the

<br>

background information (implicit and explicit context), the different <br>

sources (maps, web, social, news, ...), and the needed components and <br>

interaction.  A group of local Peking University grad students is

<br>

available to serve as oracles for local information.<br>

<br>

The scientific evaluation of the resulting "entertainment plan"

will be <br>

done by executing it in the evening after the workshop, with all <br>

participants.<br>

<br>

- Are you willing and able to sponsor the social event?  Please contact

<br>

the organizers for details.<br>

- Do you want to take part?  Read the Call for Submission and contribute!<br>

<br>

* Call for Submissions<br>

<br>

We invite the submission of papers that think outside the box, from any

<br>

aspect of relevance to the workshop's theme, including:<br>

<br>

- information seeking behavior, interaction, berry-picking;<br>

- information needs and ways of articulating them;<br>

- implicit and explicit feedback;<br>

- exploiting collection structure and semantic annotations;<br>

- exploratory search, HCI, UI and UX design;<br>

- situated search (maps, Geo, customized, personalized, ...);<br>

- entertainment search (broadcasters, content owners, network operators,

<br>

device manufacturers).<br>

<br>

We aim to bring together a varied group of researchers covering both <br>

user and system centered approaches, and together work on ways to make

<br>

IR systems support searchers when interactively solving a complex task,

<br>

such as the entertain me planning problem.<br>

<br>

Help us shape the future of IR!<br>

<br>

- Submit a short 2-page poster or position paper of relevance to <br>

supporting complex tasks, e.g., that identify specific research problems

<br>

and use-cases, develop models/theory of complex tasks and interaction,

<br>

discuss novel interfaces or system components, examine ways of <br>

evaluating, and/or report on preliminary experiments,<br>

<br>

- and take actively part in the discussion at the Workshop.<br>

<br>

The deadline is Monday June 3, 2011, submission details and further <br>

information are on </font></tt><a href=http://staff.science.uva.nl/~kamps/entertainme/><tt><font size=2>http://staff.science.uva.nl/~kamps/entertainme/</font></tt></a><tt><font size=2><br>

<br>

Nick Belkin (Rutgers)<br>

Charlie Clarke (Waterloo)<br>

Ning Gao (Peking University)<br>

Jaap Kamps (Amsterdam)<br>

Jussi Karlgren (SICS)<br>

<br>

<br>

<br>

----------------------------------------------------------------------<br>

Send Corpora mailing list submissions to<br>

corpora@uib.no<br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

</font></tt><a href=http://mailman.uib.no/listinfo/corpora><tt><font size=2>http://mailman.uib.no/listinfo/corpora</font></tt></a><tt><font size=2><br>

or, via email, send a message with subject or body 'help' to<br>

corpora-request@uib.no<br>

<br>

You can reach the person managing the list at<br>

corpora-owner@uib.no<br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of Corpora digest..."<br>

<br>

<br>

_______________________________________________<br>

Corpora mailing list<br>

Corpora@uib.no<br>

</font></tt><a href=http://mailman.uib.no/listinfo/corpora><tt><font size=2>http://mailman.uib.no/listinfo/corpora</font></tt></a><tt><font size=2><br>

<br>

<br>

End of Corpora Digest, Vol 46, Issue 6<br>

**************************************<br>

</font></tt>

<br>

<br>