[Corpora-List] R: Corpora Digest, Vol 49, Issue 16

rita.calabrese at libero.it rita.calabrese at libero.it
Sat Jul 16 08:49:40 UTC 2011


Dear Sammy Danso,
the quickest way to build an electronic corpus from printed materials is the 
OCR (Optical Character Recognition) application. You can freely download the 
software from the following website: 

http://softi-freeocr.softonic.it/download

Best wishes

Rita Calabrese
University of Salerno
via Ponte don Melillo
84084 Fisciano (SA)
ITALY



>----Messaggio originale----
>Da: corpora-request at uib.no
>Data: 14/07/2011 5.40
>A: <corpora at uib.no>
>Ogg: Corpora Digest, Vol 49, Issue 16
>
>Today's Topics:
>
>   1. Re:  Typing Urdu text in LaTeX (Paul Johnston)
>   2. Re:  Typing Urdu text in LaTeX (Alberto Simões)
>   3. Re:  Typing Urdu text in LaTeX (manaal faruqui)
>   4.  Hebrew texts in Latin lettrs (Yuri Tambovtsev)
>   5. Re:  Hebrew texts in Latin lettrs (Nomi Guthmann)
>   6.  First Call for Papers: 8th Workshop on Syntax &	Semantics
>      (WoSS8) (Géraldine Walther)
>   7. Re:  Which Statistical Test is Suitable (Geoffrey Sampson)
>   8. Re:  Which Statistical Test is Suitable (Geoffrey Sampson)
>   9.  Methodology for capturing corpus from paper to	computer
>      (Samuel Danso)
>  10. Re:  Which Statistical Test is Suitable (chris brew)
>  11. Re:  Which Statistical Test is Suitable (chris brew)
>  12. Re:  Which Statistical Test is Suitable (maxwell)
>  13. Re:  Which Statistical Test is Suitable (maxwell)
>  14. Re:  Which Statistical Test is Suitable (John F. Sowa)
>  15.  The ACL Anthology Searchbench is online (Ulrich Schaefer)
>  16. Re:  Methodology for capturing corpus from paper	tocomputer
>      (Ana Julia)
>  17. Re:  Which Statistical Test is Suitable (fatima zuhra)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Tue, 12 Jul 2011 12:00:28 +0000
>From: Paul Johnston <paul.johnston at manchester.ac.uk>
>Subject: Re: [Corpora-List] Typing Urdu text in LaTeX
>To: manaal faruqui <manaalfar at gmail.com>, "corpora at uib.no"
>	<corpora at uib.no>
>
>Try something along the lines of
>
>\documentclass[11pt]{article}
>\usepackage{arabtex}
>\begin{document}
>\begin{RLtext}
>\seturdu
>abcdefgijklmnop
>\end{RLtext}
>\end{document}
>
>I don't pretend to speak Urdu but it compiles and looks reasonable.
>
>Paul
>
>-----Original Message-----
>From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of 
manaal faruqui
>Sent: 12 July 2011 11:53
>To: corpora at uib.no
>Subject: Re: [Corpora-List] Typing Urdu text in LaTeX
>
>I am using the transliteration given here: http://en.wikipedia.
org/wiki/ArabTeX
>
>On Tue, Jul 12, 2011 at 4:21 PM, manaal faruqui <manaalfar at gmail.com> wrote:
>> Hi All,
>>
>> I have to write a report in which I need to insert Urdu in Latex.
>> I have used \usepackage{arabtex} and I am trying to use
>>
>> \texturdu{} to write the Urdu words, but its saying that its an 
>> "Undefined control sequence".
>>
>> I am using the transliteration given here:
>>
>> and the sty file from here:
>> http://www.tex.ac.uk/tex-archive/language/arabtex/texinput/arabtex.sty
>>
>> Please help.
>>
>> Thanks a lot,
>> Manaal Faruqui
>> 4th year UG student
>> IIT Kharagpur, India
>> http://cse.iitkgp.ac.in/~manaalf
>>
>
>_______________________________________________
>UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>Corpora mailing list
>Corpora at uib.no
>http://mailman.uib.no/listinfo/corpora
>
>
>
>------------------------------
>
>Message: 2
>Date: Wed, 13 Jul 2011 11:20:38 +0100
>From: Alberto Simões <albie at alfarrabio.di.uminho.pt>
>Subject: Re: [Corpora-List] Typing Urdu text in LaTeX
>To: corpora at uib.no
>
>Hello
>
>I am a complete ignorant about Urdu, but if you are able to type Urdu 
>characters directly in UTF8, you can use XeLaTeX to typeset it.
>
>If this is a possibility, let me know and I'll help with the XeLaTeX 
>document structure.
>
>All the best,
>Alberto
>
>On 12/07/2011 13:00, Paul Johnston wrote:
>> Try something along the lines of
>>
>> \documentclass[11pt]{article}
>> \usepackage{arabtex}
>> \begin{document}
>> \begin{RLtext}
>> \seturdu
>> abcdefgijklmnop
>> \end{RLtext}
>> \end{document}
>>
>> I don't pretend to speak Urdu but it compiles and looks reasonable.
>>
>> Paul
>>
>> -----Original Message-----
>> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of 
manaal faruqui
>> Sent: 12 July 2011 11:53
>> To: corpora at uib.no
>> Subject: Re: [Corpora-List] Typing Urdu text in LaTeX
>>
>> I am using the transliteration given here: http://en.wikipedia.
org/wiki/ArabTeX
>>
>> On Tue, Jul 12, 2011 at 4:21 PM, manaal faruqui<manaalfar at gmail.com>  
wrote:
>>> Hi All,
>>>
>>> I have to write a report in which I need to insert Urdu in Latex.
>>> I have used \usepackage{arabtex} and I am trying to use
>>>
>>> \texturdu{} to write the Urdu words, but its saying that its an
>>> "Undefined control sequence".
>>>
>>> I am using the transliteration given here:
>>>
>>> and the sty file from here:
>>> http://www.tex.ac.uk/tex-archive/language/arabtex/texinput/arabtex.sty
>>>
>>> Please help.
>>>
>>> Thanks a lot,
>>> Manaal Faruqui
>>> 4th year UG student
>>> IIT Kharagpur, India
>>> http://cse.iitkgp.ac.in/~manaalf
>>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
>-- 
>Alberto Simoes
>CCTC-UM / CEHUM
>
>
>
>------------------------------
>
>Message: 3
>Date: Wed, 13 Jul 2011 15:55:01 +0530
>From: manaal faruqui <manaalfar at gmail.com>
>Subject: Re: [Corpora-List] Typing Urdu text in LaTeX
>To: albie at alfarrabio.di.uminho.pt
>Cc: corpora at uib.no
>
>Thanks all, the problem was solved by the method told by Paul. :)
>
>Manaal
>
>2011/7/13 Alberto Simões <albie at alfarrabio.di.uminho.pt>:
>> Hello
>>
>> I am a complete ignorant about Urdu, but if you are able to type Urdu
>> characters directly in UTF8, you can use XeLaTeX to typeset it.
>>
>> If this is a possibility, let me know and I'll help with the XeLaTeX
>> document structure.
>>
>> All the best,
>> Alberto
>>
>> On 12/07/2011 13:00, Paul Johnston wrote:
>>>
>>> Try something along the lines of
>>>
>>> \documentclass[11pt]{article}
>>> \usepackage{arabtex}
>>> \begin{document}
>>> \begin{RLtext}
>>> \seturdu
>>> abcdefgijklmnop
>>> \end{RLtext}
>>> \end{document}
>>>
>>> I don't pretend to speak Urdu but it compiles and looks reasonable.
>>>
>>> Paul
>>>
>>> -----Original Message-----
>>> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
>>> manaal faruqui
>>> Sent: 12 July 2011 11:53
>>> To: corpora at uib.no
>>> Subject: Re: [Corpora-List] Typing Urdu text in LaTeX
>>>
>>> I am using the transliteration given here:
>>> http://en.wikipedia.org/wiki/ArabTeX
>>>
>>> On Tue, Jul 12, 2011 at 4:21 PM, manaal faruqui<manaalfar at gmail.com>
>>>  wrote:
>>>>
>>>> Hi All,
>>>>
>>>> I have to write a report in which I need to insert Urdu in Latex.
>>>> I have used \usepackage{arabtex} and I am trying to use
>>>>
>>>> \texturdu{} to write the Urdu words, but its saying that its an
>>>> "Undefined control sequence".
>>>>
>>>> I am using the transliteration given here:
>>>>
>>>> and the sty file from here:
>>>> http://www.tex.ac.uk/tex-archive/language/arabtex/texinput/arabtex.sty
>>>>
>>>> Please help.
>>>>
>>>> Thanks a lot,
>>>> Manaal Faruqui
>>>> 4th year UG student
>>>> IIT Kharagpur, India
>>>> http://cse.iitkgp.ac.in/~manaalf
>>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>>
>> --
>> Alberto Simoes
>> CCTC-UM / CEHUM
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
>
>------------------------------
>
>Message: 4
>Date: Wed, 13 Jul 2011 18:02:36 +0700
>From: "Yuri Tambovtsev" <yutamb at mail.ru>
>Subject: [Corpora-List] Hebrew texts in Latin lettrs
>To: <corpora at uib.no>
>
>Dear Corpora colleagues, do you know any websites of Hebrew texts in Latin 
lettrs? I cannot read Hebrew letters. However, I'd like to compare Hebrew sound 
chains with those I have in about 300 world languages. Looking forward to 
hearing from you soon to yutamb at mail.ru  Yours sincerely Yuri Tambovtsev, 
Novosibirsk, Russia 
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: not available
>Type: text/html
>Size: 680 bytes
>Desc: not available
>URL: <http://www.uib.
no/mailman/public/corpora/attachments/20110713/3239269a/attachment.txt>
>
>------------------------------
>
>Message: 5
>Date: Wed, 13 Jul 2011 15:01:25 +0300
>From: Nomi Guthmann <nomi.guthmann at googlemail.com>
>Subject: Re: [Corpora-List] Hebrew texts in Latin lettrs
>To: Yuri Tambovtsev <yutamb at mail.ru>
>Cc: corpora at uib.no
>
>Hi Yuri,
>
>The Hebrew Treebank corpus from the Mila Knowledge Center for Processing
>Hebrew has a transliterated version. It is available here
>http://www.mila.cs.technion.ac.il/mila/eng/resources_treebank.html
>The transcription that was used is described in
>http://www.cs.technion.ac.il/~winter/Corpus-Project/paper.pdf
>
>Noemie
>
>2011/7/13 Yuri Tambovtsev <yutamb at mail.ru>
>
>> **
>> Dear Corpora colleagues, do you know any websites of Hebrew texts in Latin
>> lettrs? I cannot read Hebrew letters. However, I'd like to compare Hebrew
>> sound chains with those I have in about 300 world languages. Looking 
forward
>> to hearing from you soon to yutamb at mail.ru  Yours sincerely Yuri
>> Tambovtsev, Novosibirsk, Russia
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: not available
>Type: text/html
>Size: 1706 bytes
>Desc: not available
>URL: <http://www.uib.
no/mailman/public/corpora/attachments/20110713/e44c24e2/attachment.txt>
>
>------------------------------
>
>Message: 6
>Date: Wed, 13 Jul 2011 15:44:35 +0200
>From: Géraldine Walther 	<geraldine.walther at linguist.jussieu.fr>
>Subject: [Corpora-List] First Call for Papers: 8th Workshop on Syntax
>	&	Semantics (WoSS8)
>To: corpora at uib.no
>
>[Apologies for cross-postings]
>
>***FIRST CALL FOR PAPERS***
>
>8th Workshop on Syntax & Semantics (WoSS)
>November 17th-18th, 2011
>Paris, France
>
>*****
>
>We invite PhD students to send abstracts for twenty-minute talks followed by 
a ten-minute discussion or poster presentations on any aspect of theoretical 
linguistics for the 8th Workshop on Syntax & Semantics (WoSS). 
>
>WoSS is a series of rotating workshops organized by PhD students from 
'neighbouring' universities (see list below) for PhD students working in 
different domains of generative linguistics, in a broad sense, e.g. syntax, 
semantics, pragmatics, morphology, phonetics, phonology, language acquisition, 
computational linguistics, etc.
>
>The institutions behind WoSS are:
>The University of Nantes
>The University of the Basque Country in Vitoria-Gasteiz (EHU)
>The Universities of Catalonia (UAB, UB, UPF, URV)
>The Universities of Paris 3, Paris 7, Paris 8
>The Universities of Madrid (IUOG, UAM, UCM)
>The University of Sienna
>
>This year's WoSS is co-organized by University Paris Diderot (Paris 7) and 
University Paris Vincennes St-Denis (Paris 8), and will take place at the CNRS 
`Pouchet' building, 59 rue Pouchet, 75017 Paris, on November 17th-18th, 2011.
>
>Submission instructions
>
>Abstracts must be anonymous and at most two pages long, examples and 
references included, on an A4 sheet with one-inch (2.54 cm) margins and 12-
point Times New Roman font, single spacing. 
>
>Submissions are limited to one individual and one joint abstract per author, 
or two joint abstracts per author. The abstracts must be submitted over 
EasyChair as PDF attachment by the 31th of August. 
>
>https://www.easychair.org/conferences/?conf=woss8
>
>Accepted papers will be presented orally or as posters depending on nature 
and quality of the work., you may however specifically indicate whether you 
would like to present your paper rather as an oral presentation or a poster.
>
>INVITED SPEAKERS
>
>We have the pleasure to announce that the following speakers will be giving 
an invited talk at WoSS8:
>
>Paolo Acquaviva (University College Dublin) 
>Bob Borsley (Univerity of Essex)
>Philippe Schlenker (ENS-NYU)
>
>Important dates:
>
>Deadline for submission: August 31, 2011
>
>Notification of acceptance: October 7, 2011 
>
>Scientific Committee:
>
>Xiaoliang HUANG, Paris 7. 
>Christophe ONAMBELE, Paris 8. 
>Marie PHILIPPE, Paris 8. 
>Géraldine WALTHER, Paris 7. 
>Grégoire WINTERSTEIN, Paris 7. 
>
>A WoSS8 website is currently under construction and will be available soon.
>
>More informations about previous WoSS can be found at: 
>
>http://www.woss7.univ-nantes.fr/
>
>If you have any questions, please contact us at: 
>
>woss8paris at gmail.com
>
>
>
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: not available
>Type: text/html
>Size: 8404 bytes
>Desc: not available
>URL: <http://www.uib.
no/mailman/public/corpora/attachments/20110713/94267d35/attachment.txt>
>
>------------------------------
>
>Message: 7
>Date: Wed, 13 Jul 2011 15:34:05 +0100
>From: Geoffrey Sampson <grs2 at sussex.ac.uk> 
>Subject: Re: [Corpora-List] Which Statistical Test is Suitable
>To: True Friend <true.friend2004 at gmail.com>
>Cc: corpora <corpora at uib.no>, corpora at lists.uib.no
>
>Dear Muhammad Shakir Aziz,
>
>I don't see that anyone else has responded to your query, so let me do so,
>rather late.  I would say that no kind of statistical test could possibly
>indicate whether variant spellings were errors, or allowable alternatives;
>because this question is not to do with numbers.  It is a question about
>where authority over the norms of the language you are concerned with is
>felt to lie, and what that authority says about orthography.  Some
>languages, at some periods, tolerate a wide variety of alternative
>spellings for given words, while other languages (or the same languages at
>other periods) may have extremely tightly-defined norms and strong social
>sanctions against violating them.  Carrying out statistical calculations on
>tables of the incidence of alternatives would not tell you anything about
>this, I believe.
>
>Geoffrey Sampson
>
>
>
>
>------------------------------
>
>Message: 8
>Date: Wed, 13 Jul 2011 15:34:05 +0100
>From: Geoffrey Sampson <grs2 at sussex.ac.uk> 
>Subject: Re: [Corpora-List] Which Statistical Test is Suitable
>To: True Friend <true.friend2004 at gmail.com>
>Cc: corpora <corpora at uib.no>, corpora at lists.uib.no
>
>Dear Muhammad Shakir Aziz,
>
>I don't see that anyone else has responded to your query, so let me do so,
>rather late.  I would say that no kind of statistical test could possibly
>indicate whether variant spellings were errors, or allowable alternatives;
>because this question is not to do with numbers.  It is a question about
>where authority over the norms of the language you are concerned with is
>felt to lie, and what that authority says about orthography.  Some
>languages, at some periods, tolerate a wide variety of alternative
>spellings for given words, while other languages (or the same languages at
>other periods) may have extremely tightly-defined norms and strong social
>sanctions against violating them.  Carrying out statistical calculations on
>tables of the incidence of alternatives would not tell you anything about
>this, I believe.
>
>Geoffrey Sampson
>
>
>
>
>------------------------------
>
>Message: 9
>Date: Wed, 13 Jul 2011 15:55:53 +0100
>From: "Samuel Danso" <scsod at leeds.ac.uk>
>Subject: [Corpora-List] Methodology for capturing corpus from paper to
>	computer
>To: "'corpora'" <corpora at uib.no>
>
>Dear All
>
>Please advise on methodology for capturing paper forms into a computer
>corpus.
>
> 
>
>My research involves a collection of 10,000 Verbal Autopsy interviews of
>mother or close relative of deceased, currently on paper forms. How should I
>have these typed onto PC? - double entry by two independent clerks is twice
>the cost of single entry (with checking by managers), is it really
>necessary?
>
> 
>
>Sammy Danso, 
>
>Leeds University, UK and Kintampo Health Centre, Ghana
>
> 
>
> 
>
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: not available
>Type: text/html
>Size: 2670 bytes
>Desc: not available
>URL: <http://www.uib.
no/mailman/public/corpora/attachments/20110713/c672c6d3/attachment.txt>
>
>------------------------------
>
>Message: 10
>Date: Wed, 13 Jul 2011 11:17:03 -0400
>From: chris brew <cbrew at acm.org>
>Subject: Re: [Corpora-List] Which Statistical Test is Suitable
>To: Geoffrey Sampson <grs2 at sussex.ac.uk>
>Cc: corpora <corpora at uib.no>, corpora at lists.uib.no
>
>I partially agree with Geoffrey Sampson's points. It is certainly true that
>a table of numbers, in isolation, tells you nothing about the question you
>are asking, for the reasons that Professor Sampson gives. And statistical
>tests will not change this situation. To make progress, you need to be
>precise about what you intend to count as a  "spelling error". You could for
>example reframe the problem by as "how likely is it that the numbers that we
>observe are due to random mistakes in typing?", then proceed to make a
>mathematical model of typing errors. Or you could contrast the typing error
>hypothesis with an alternative hypothesis and frame the question as "Are the
>numbers that we observe more likely to be the result of typing errors or
>more likely to be due to the existence in the writing population of two
>groups of people, one of which always tries to spell the word one way, and
>one of which tries to spell it the other way". It will take some clear
>thinking to get this comparison right, because you have to make a precise
>quantitative judgement on things like the prior probability of finding
>groups that spell differently in the way we hypothesize. From experience of
>US/UK spelling differences, I believe that it would be a tricky and subtle
>matter to come up with suitably precise and useful hypotheses. No surprise
>there, as linguists we are used to working with challenging and complex
>data.
>
>But, if you do manage to set up sufficiently precise hypotheses, and
>associate numbers with the hypotheses, statistical reasoning definitely can
>help. That's what it is for. This kind of thinking is the basis for all
>statistical tests that I am aware of. What you are never going to find is a
>statistical test that frees you from the necessity of making (or finding in
>the work of other scholars)  a precise and careful analysis of the problem
>you are trying to solve.
>
>Chris
>
>On Wed, Jul 13, 2011 at 10:34 AM, Geoffrey Sampson <grs2 at sussex.ac.uk>wrote:
>
>> Dear Muhammad Shakir Aziz,
>>
>> I don't see that anyone else has responded to your query, so let me do so,
>> rather late.  I would say that no kind of statistical test could possibly
>> indicate whether variant spellings were errors, or allowable alternatives;
>> because this question is not to do with numbers.  It is a question about
>> where authority over the norms of the language you are concerned with is
>> felt to lie, and what that authority says about orthography.  Some
>> languages, at some periods, tolerate a wide variety of alternative
>> spellings for given words, while other languages (or the same languages at
>> other periods) may have extremely tightly-defined norms and strong social
>> sanctions against violating them.  Carrying out statistical calculations on
>> tables of the incidence of alternatives would not tell you anything about
>> this, I believe.
>>
>> Geoffrey Sampson
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
>
>-- 
>Chris Brew, Ohio State University
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: not available
>Type: text/html
>Size: 3688 bytes
>Desc: not available
>URL: <http://www.uib.
no/mailman/public/corpora/attachments/20110713/b8a69875/attachment.txt>
>
>------------------------------
>
>Message: 11
>Date: Wed, 13 Jul 2011 11:17:03 -0400
>From: chris brew <cbrew at acm.org>
>Subject: Re: [Corpora-List] Which Statistical Test is Suitable
>To: Geoffrey Sampson <grs2 at sussex.ac.uk>
>Cc: corpora <corpora at uib.no>, corpora at lists.uib.no
>
>I partially agree with Geoffrey Sampson's points. It is certainly true that
>a table of numbers, in isolation, tells you nothing about the question you
>are asking, for the reasons that Professor Sampson gives. And statistical
>tests will not change this situation. To make progress, you need to be
>precise about what you intend to count as a  "spelling error". You could for
>example reframe the problem by as "how likely is it that the numbers that we
>observe are due to random mistakes in typing?", then proceed to make a
>mathematical model of typing errors. Or you could contrast the typing error
>hypothesis with an alternative hypothesis and frame the question as "Are the
>numbers that we observe more likely to be the result of typing errors or
>more likely to be due to the existence in the writing population of two
>groups of people, one of which always tries to spell the word one way, and
>one of which tries to spell it the other way". It will take some clear
>thinking to get this comparison right, because you have to make a precise
>quantitative judgement on things like the prior probability of finding
>groups that spell differently in the way we hypothesize. From experience of
>US/UK spelling differences, I believe that it would be a tricky and subtle
>matter to come up with suitably precise and useful hypotheses. No surprise
>there, as linguists we are used to working with challenging and complex
>data.
>
>But, if you do manage to set up sufficiently precise hypotheses, and
>associate numbers with the hypotheses, statistical reasoning definitely can
>help. That's what it is for. This kind of thinking is the basis for all
>statistical tests that I am aware of. What you are never going to find is a
>statistical test that frees you from the necessity of making (or finding in
>the work of other scholars)  a precise and careful analysis of the problem
>you are trying to solve.
>
>Chris
>
>On Wed, Jul 13, 2011 at 10:34 AM, Geoffrey Sampson <grs2 at sussex.ac.uk>wrote:
>
>> Dear Muhammad Shakir Aziz,
>>
>> I don't see that anyone else has responded to your query, so let me do so,
>> rather late.  I would say that no kind of statistical test could possibly
>> indicate whether variant spellings were errors, or allowable alternatives;
>> because this question is not to do with numbers.  It is a question about
>> where authority over the norms of the language you are concerned with is
>> felt to lie, and what that authority says about orthography.  Some
>> languages, at some periods, tolerate a wide variety of alternative
>> spellings for given words, while other languages (or the same languages at
>> other periods) may have extremely tightly-defined norms and strong social
>> sanctions against violating them.  Carrying out statistical calculations on
>> tables of the incidence of alternatives would not tell you anything about
>> this, I believe.
>>
>> Geoffrey Sampson
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
>
>-- 
>Chris Brew, Ohio State University
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: not available
>Type: text/html
>Size: 3688 bytes
>Desc: not available
>URL: <http://www.uib.
no/mailman/public/corpora/attachments/20110713/b8a69875/attachment.txt>
>
>------------------------------
>
>Message: 12
>Date: Wed, 13 Jul 2011 12:52:27 -0400
>From: maxwell <maxwell at umiacs.umd.edu>
>Subject: Re: [Corpora-List] Which Statistical Test is Suitable
>To: chris brew <cbrew at acm.org>
>Cc: corpora <corpora at uib.no>, corpora at lists.uib.no
>
>I am not at all familiar with the literature, but it's possible that the
>literature people have looked at spelling (non-)standardization in the
>period in English between, say, Chaucer (when not only was every writer a
>law unto himself, but an individual writer might have a lot of variation),
>up into the era of spelling standardization (when individual writers could
>be law-abiding citizens or outlaws :-)).  Perhaps similar sorts of things
>happened in other languages that underwent standardization (mostly European
>languages, I'm guessing).
>
>If they have worked on this, a place to start a literature search might be
>the ALLC (Association for Linguistic and Literary Computing) and the
>Association for Computing in the Humanities.  The two orgs have met for
>joint conferences in the last decade, I believe.
>
>   Mike Maxwell
>
>
>
>------------------------------
>
>Message: 13
>Date: Wed, 13 Jul 2011 12:52:27 -0400
>From: maxwell <maxwell at umiacs.umd.edu>
>Subject: Re: [Corpora-List] Which Statistical Test is Suitable
>To: chris brew <cbrew at acm.org>
>Cc: corpora <corpora at uib.no>, corpora at lists.uib.no
>
>I am not at all familiar with the literature, but it's possible that the
>literature people have looked at spelling (non-)standardization in the
>period in English between, say, Chaucer (when not only was every writer a
>law unto himself, but an individual writer might have a lot of variation),
>up into the era of spelling standardization (when individual writers could
>be law-abiding citizens or outlaws :-)).  Perhaps similar sorts of things
>happened in other languages that underwent standardization (mostly European
>languages, I'm guessing).
>
>If they have worked on this, a place to start a literature search might be
>the ALLC (Association for Linguistic and Literary Computing) and the
>Association for Computing in the Humanities.  The two orgs have met for
>joint conferences in the last decade, I believe.
>
>   Mike Maxwell
>
>
>
>------------------------------
>
>Message: 14
>Date: Wed, 13 Jul 2011 13:19:14 -0400
>From: "John F. Sowa" <sowa at bestweb.net>
>Subject: Re: [Corpora-List] Which Statistical Test is Suitable
>To: corpora at uib.no
>
>On 7/13/2011 11:17 AM, chris brew wrote:
>> I partially agree with Geoffrey Sampson's points. It is certainly true
>> that a table of numbers, in isolation, tells you nothing about the
>> question you are asking, for the reasons that Professor Sampson gives.
>> And statistical tests will not change this situation...
>
>All statistical methods are based on some model about the processes
>that generate the data.  And as the statistician George Box observed:
>
>    All models are wrong, but some are useful.
>
>Geoffrey Sampson:
>> It is a question about where authority over the norms of the language
>> you are concerned with is felt to lie, and what that authority says
>> about orthography.
>
>Yes, and those authorities could be authors, dictionaries, or some
>official legislation.
>
>CB
>> But, if you do manage to set up sufficiently precise hypotheses,
>> and associate numbers with the hypotheses, statistical reasoning
>> definitely can help.
>
>I agree that statistics can help.  But there are many models for
>generating statistics.  Should you give higher weights to typing
>mistakes, dictionaries, legislation, or common usage?
>
>John
>
>
>
>
>------------------------------
>
>Message: 15
>Date: Wed, 13 Jul 2011 21:26:45 +0200
>From: Ulrich Schaefer <ulrich.schaefer at dfki.de>
>Subject: [Corpora-List] The ACL Anthology Searchbench is online
>To: corpora at uib.no
>
>Dear all,
>
>the ACL Anthology Searchbench is online at http://aclasb.dfki.de (also
>reachable from the ACL Anthology start page aclweb.org/anthology --
>thanks to Min-Yen Kan for integrating it!).
>
>The Searchbench combines semantic, full text and bibliographic search
>in more than 19,000 Computational Linguistics papers of the ACL
>Anthology from the past 47 years, including the complete Journal.
>
>Highlights are
>
>- "statements" search: you can search for subject-predicate-object
>   triples in millions of sentences, where predicates can also be
>   synonyms, and taking passives and sentence negation into account
>
>- combination with bibliographic and full text filters
>
>- search result (filter) URLs can be bookmarked or emailed
>
>- display of search result sentences in original PDF layout.
>   This requires the Adobe Acrobat Reader browser plug-in with
>   Preferences/Search/"external highlight server" enabled and doesn't
>   work well on older, scanned papers (page should always be correct).
>
>The Searchbench itself requires a recent web browser with JavaScript
>enabled.  Details see "Help" at the left bottom of the Searchbench
>user interface.
>
>The Searchbench is not perfect -- it is a milestone in an ongoing
>research project (TAKE).  There was no manual correction of OCR or NLP
>errors.  Missing author affiliation data of 2010 and 2011 papers will
>be added later.
>
>However, we hope you find it a useful tool also for your scientific
>work.  Your feedback is welcome ("Feedback" button at left bottom)!
>
>
>-- The TAKE Searchbench team Ulrich Schäfer, Bernd Kiefer, Christian
>Spurk, Jörg Steffen and Rui Wang
>
>   ...with thanks to all others who have contributed to this endeavor
>   (see "About" at left bottom, also contains a link to the ACL paper
>   describing the Searchbench internals).
>
>The Searchbench has been developed in the context of the BMBF-funded
>project TAKE, the DFG Cluster of Excellence on Multimodal Computing
>and Interaction (MMCI) and the international DELPH-IN collaboration.
>
>-- 
>Dr. Ulrich Schäferhttp://dfki.de/~uschaefer   phone:+49681857755154
>     DFKI Language Technology Lab, D-66123 Saarbruecken, Germany
>-------------------------------------------------------------------
>    Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>      Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
>    Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
>(Vorsitzender), Dr. Walter Olthoff. Vorsitzender des Aufsichtsrats:
>Prof. Dr. h.c. Hans A. Aukes. Amtsgericht Kaiserslautern, HRB 2313
>
>
>
>
>------------------------------
>
>Message: 16
>Date: Wed, 13 Jul 2011 12:24:48 -0300
>From: "Ana Julia" <anajulia at corpuslg.org>
>Subject: Re: [Corpora-List] Methodology for capturing corpus from
>	paper	tocomputer
>To: "Samuel Danso" <scsod at leeds.ac.uk>
>Cc: corpora at uib.no
>
>Dear Samuel
>
>I have faced something similar,
>and my solution was to read all the reports (because they were handwritten) 
to my IBM Via Voice program.  I couldn't think about any other better strategy 
by the time... let's see if the colleagues have any better solutions
>
>regards,
>
>Ana Julia Perrotti-Garcia
>Scientia Vinces Serv. Trad. Ltda
>Translators of Dental and Medical Texts
>Italiano > Español > Português <> English
>Proficiency in English (CPE) University of Cambridge UK
>Visit our webpage at www.scientiavinces.com/ana/
>São Paulo, Brazil
>
>
>----- Original Message ----- 
>  From: Samuel Danso 
>  To: 'corpora' 
>  Sent: Wednesday, July 13, 2011 11:55 AM
>  Subject: [Corpora-List] Methodology for capturing corpus from paper 
tocomputer
>
>
>  Dear All
>
>  Please advise on methodology for capturing paper forms into a computer 
corpus.
>
>   
>
>  My research involves a collection of 10,000 Verbal Autopsy interviews of 
mother or close relative of deceased, currently on paper forms. How should I 
have these typed onto PC? - double entry by two independent clerks is twice the 
cost of single entry (with checking by managers), is it really necessary?
>
>   
>
>  Sammy Danso, 
>
>  Leeds University, UK and Kintampo Health Centre, Ghana
>
>   
>
>   
>
>
>

>------------------------------------------------------------------------------
>
>
>  _______________________________________________
>  UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>  Corpora mailing list
>  Corpora at uib.no
>  http://mailman.uib.no/listinfo/corpora
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: not available
>Type: text/html
>Size: 5133 bytes
>Desc: not available
>URL: <http://www.uib.
no/mailman/public/corpora/attachments/20110713/e59da4cf/attachment.txt>
>
>------------------------------
>
>Message: 17
>Date: Wed, 13 Jul 2011 20:40:09 -0700 (PDT)
>From: fatima zuhra <fateeshah at yahoo.com>
>Subject: Re: [Corpora-List] Which Statistical Test is Suitable
>To: True Friend <true.friend2004 at gmail.com>
>Cc: corpora at uib.no
>
>Dear Muhammad Shakir Aziz,
>Can you please provide an example (or two) of the words, having two 
spellings? I have worked with Pashto text and I have observed that a single 
Pashto word is spelled in several (more than two) ways. 
>One of my works was concerned with extracting individual words from a written 
Pashto corpus. The system I used for extracting individual Pashto words gave me 
such variations of the same word that looked the same at the first glance (e.g. 
the grapheme "kaaf" may be written a bit longer than how it is written 
currently in the Urdu spelling of "Shakir" in your name, which will result in a 
variation of this spelling). Are you considering these variations or some 
others? 
>
>Regards.
>Fatima Tuz ZuhraPh.D. Scholar and Lecturer,Department of Computer Science,
University of Peshawar, Pakistan.
>--- On Sun, 7/10/11, True Friend <true.friend2004 at gmail.com> wrote:
>
>From: True Friend <true.friend2004 at gmail.com>
>Subject: [Corpora-List] Which Statistical Test is Suitable
>To: "corpora" <corpora at uib.no>, corpora at lists.uib.no
>Date: Sunday, July 10, 2011, 8:23 PM
>
>Dear Members
>I am working on a research paper regarding spelling variations. In my 
language, Urdu, there are some words which have two spellings. For example the 
data can be like this:
>
> 
> 
>  Word
>  Spelling 1
>  Spelling 2
> 
> 
>  X
>  24
>  40
> 
> 
>  Y
>  600
>  200
> 
> 
>  Z
>  300
>  1000
> 
>Now what I want to show that alternate spellings do exist for this group of 
words and they are not just spelling errors. Can I use a correlation formula to 
show that two spellings have a relation?
>Waiting for your suggestions.
>
>Regards
>-- 
>Muhammad Shakir Aziz ???? corpora at uib.no? ????
>
>Masters in Applied Linguistics
>Translator, Course Developer, Linguist for Urdu, Punjabi and English
>
>Urdu:- http://awaz-e-dost.blogspot.com/
>
>English:- http://linguisticslearner.blogspot.com/
>
>Facebook:- http://www.facebook.com/truefriend2004
>
>Skype:- true_friend2004
>
>
>
>-----Inline Attachment Follows-----
>
>_______________________________________________
>UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>Corpora mailing list
>Corpora at uib.no
>http://mailman.uib.no/listinfo/corpora
>-------------- next part --------------
>A non-text attachment was scrubbed...
>Name: not available
>Type: text/html
>Size: 5213 bytes
>Desc: not available
>URL: <http://www.uib.
no/mailman/public/corpora/attachments/20110713/97908911/attachment.txt>
>
>----------------------------------------------------------------------
>Send Corpora mailing list submissions to
>	corpora at uib.no
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	http://mailman.uib.no/listinfo/corpora
>or, via email, send a message with subject or body 'help' to
>	corpora-request at uib.no
>
>You can reach the person managing the list at
>	corpora-owner at uib.no
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Corpora digest..."
>
>
>_______________________________________________
>Corpora mailing list
>Corpora at uib.no
>http://mailman.uib.no/listinfo/corpora
>
>
>End of Corpora Digest, Vol 49, Issue 16
>***************************************
>



_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list