[Corpora-List] help on document comparison for historians
António Branco
Antonio.Branco at di.fc.ul.pt
Mon Jun 6 10:49:01 UTC 2011
Dear all,
Following my previous message concerning the subject above,
there were a number of replies from:
Dina Demner Fushman <ddemner at mail.nih.gov>
Eric Ringer <ringger at cs.byu.edu>
Serge Heiden <slh at ens-lyon.fr>
Paul D Clough <p.d.clough at sheffield.ac.uk>
Tony Mcenery <eiaamme at exchange.lancs.ac.uk>
I would like to thank you all for your help.
The compilation of your suggestions follow below.
Best regards,
António
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
= Tools suggested
- Dina Demner Fushman <ddemner at mail.nih.gov>
eTBLAST
http://etest.vbi.vt.edu/etblast3/
- Eric Ringer <ringger at cs.byu.edu>
If running on Windows, download WinMerge from SourceForge.
Similar tools exist for Mac and Linux/Unix.
= Publications suggested
- Serge Heiden <slh at ens-lyon.fr>
Russell Horton, Mark Olsen, and Glenn Roe, "Something Borrowed: Sequence
Alignment and the Identification of Similar Passages in Large Text
Collections,", Digital Studies / Le Champ numérique (Forthcoming 2010).
Russell Horton and Les Henderson, "Sequence Alignment and Similarity in
Biology and the Humanities", Chicago Colloquium on Digital Humanities
and Computer Science (DHCS), Northwestern University, November 2010.
Mark Olsen, "From Words to Works: Machine Learning, Sequence Alignment
and Text Mining at ARTFL", Computation Institute, University of Chicago,
June 2010.
Glenn Roe, Encyclopedic Intertextuality: Identifying Intertextual
Relationships in the Encyclopédie using Sequence Alignment,"Knowledge
Production, Technology, and Cultural Change: Colloquium on the Digital
Encyclopédie" - University of Minnesota, April 23-24, 2009.
Russell Horton and Mark Olsen, "Sequence Alignment, Shared Services, and
Digital Humanities", Project Bamboo Workshop, Tucson, Arizona, January 2009.
All from http://artfl-project.uchicago.edu/content/papers-and-presentations
- Paul D Clough <p.d.clough at sheffield.ac.uk>
Collating Texts Using Progressive Multiple Alignment Matthew Spencer
and Christopher J. Howe Computers and the Humanities Vol. 38, No. 3
(Aug., 2004), pp. 253-270
http://www.jstor.org/pss/30204940
And this:
http://opus.bibliothek.uni-wuerzburg.de/volltexte/2011/5660/pdf/Nassourou_DesignArchitectureCollationSystem.pdf
- Tony McEnery <eiaamme at exchange.lancs.ac.uk>
A couple of papers that you may find of interest looking at this very
issue are listed below. The work was done using a tool developed by
Scott Piao (based on work he was involved in at Sheffield):
Hardie, A, McEnery, T, and Piao, S. (2010) ?A corpus-based approach to
text reuse in the newsbooks of the Commonwealth? in Dooley, B (ed.) The
Dissemination of News and the Emergence of Contemporaneity in Early
Modern Europe, Ashgate, Farnham, pp 251-286.
Hardie, A and McEnery, T (2009) (2009) ?Corpus linguistics and
historical contexts: text reuse and the expression of bias in early
modern English journalism?, in R. Bowen, M. Mobärg and S. Ohlander
(eds) Corpora and discourse ? and stuff: papers in honour of Karin
Aijmer, Gothenburg Studies in English 96, Acta Universitatis
Gothoburgensis, Göteborg, pp. 59-92.
le 01/06/2011 19:02 Selon Antonio Branco:
>
>
>
> Dear all,
>
> A friend of mine is working on medieval history and would
> like to find a (user-friendly) tool that could help her with
> the following functionality: one enters different documents and
> the tool will deliver the excerpts (may be of several paragraph
> length) that are identical across documents.
>
> Any hint or help will be most welcome. Please reply to me.
> I'll post a summary.
>
> Kind regards,
>
> António Branco
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
Dr. Serge Heiden, slh at ens-lyon.fr, http://textometrie.ens-lyon.fr
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list