[Corpora-List] help on document comparison for historians

António Branco Antonio.Branco at di.fc.ul.pt
Mon Jun 6 10:49:01 UTC 2011


Dear all,


Following my previous message concerning the subject above,
there were a number of replies from:

	Dina Demner Fushman 	<ddemner at mail.nih.gov>

	Eric Ringer 		<ringger at cs.byu.edu>

	Serge Heiden 		<slh at ens-lyon.fr>

	Paul D Clough 		<p.d.clough at sheffield.ac.uk>

	Tony Mcenery 		<eiaamme at exchange.lancs.ac.uk>


I would like to thank you all for your help.
The compilation of your suggestions follow below.

Best regards,

António


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




= Tools suggested


- Dina Demner Fushman <ddemner at mail.nih.gov>

eTBLAST
http://etest.vbi.vt.edu/etblast3/



- Eric Ringer <ringger at cs.byu.edu>

If running on Windows, download WinMerge from SourceForge.
Similar tools exist for Mac and Linux/Unix.




= Publications suggested


- Serge Heiden <slh at ens-lyon.fr>

Russell Horton, Mark Olsen, and Glenn Roe, "Something Borrowed: Sequence 
Alignment and the Identification of Similar Passages in Large Text 
Collections,", Digital Studies / Le Champ numérique (Forthcoming 2010).

Russell Horton and Les Henderson, "Sequence Alignment and Similarity in 
Biology and the Humanities", Chicago Colloquium on Digital Humanities 
and Computer Science (DHCS), Northwestern University, November 2010.

Mark Olsen, "From Words to Works: Machine Learning, Sequence Alignment 
and Text Mining at ARTFL", Computation Institute, University of Chicago, 
June 2010.

Glenn Roe, Encyclopedic Intertextuality: Identifying Intertextual 
Relationships in the Encyclopédie using Sequence Alignment,"Knowledge 
Production, Technology, and Cultural Change: Colloquium on the Digital 
Encyclopédie" - University of Minnesota, April 23-24, 2009.

Russell Horton and Mark Olsen, "Sequence Alignment, Shared Services, and 
Digital Humanities", Project Bamboo Workshop, Tucson, Arizona, January 2009.

All from http://artfl-project.uchicago.edu/content/papers-and-presentations



- Paul D Clough <p.d.clough at sheffield.ac.uk>

Collating Texts Using Progressive Multiple Alignment Matthew Spencer
and Christopher J. Howe Computers and the Humanities Vol. 38, No. 3
(Aug., 2004), pp. 253-270
http://www.jstor.org/pss/30204940

And this:
http://opus.bibliothek.uni-wuerzburg.de/volltexte/2011/5660/pdf/Nassourou_DesignArchitectureCollationSystem.pdf



- Tony McEnery <eiaamme at exchange.lancs.ac.uk>

A couple of papers that you may find of interest looking at this very
issue are listed below. The work was done using a tool developed by
Scott Piao (based on work he was involved in at Sheffield):

Hardie, A, McEnery, T, and Piao, S. (2010) ?A corpus-based approach to
text reuse in the newsbooks of the Commonwealth? in Dooley, B (ed.) The
Dissemination of News and the Emergence of Contemporaneity in Early
Modern Europe, Ashgate, Farnham, pp 251-286.

Hardie, A and McEnery, T (2009) (2009) ?Corpus linguistics and
historical contexts: text reuse and the expression of bias in early
modern English journalism?, in R. Bowen, M. Mobärg and S. Ohlander
(eds) Corpora and discourse ? and stuff: papers in honour of Karin
Aijmer, Gothenburg Studies in English 96, Acta Universitatis
Gothoburgensis, Göteborg, pp. 59-92.





le 01/06/2011 19:02 Selon Ant—onio Branco:
>
>
>
> Dear all,
>
> A friend of mine is working on medieval history and would
> like to find a (user-friendly) tool that could help her with
> the following functionality: one enters different documents and
> the tool will deliver the excerpts (may be of several paragraph
> length) that are identical across documents.
>
> Any hint or help will be most welcome. Please reply to me.
> I'll post a summary.
>
> Kind regards,
>
> António Branco
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- 
Dr. Serge Heiden, slh at ens-lyon.fr, http://textometrie.ens-lyon.fr
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list