[Corpora-List] Methodology for capturing corpus from paper tocomputer

Eric Atwell csc6ea at leeds.ac.uk
Sat Jul 16 11:45:31 UTC 2011


Dear Ana,

thanks for our suggestion to use IBM ViaVoice. However, the 10,000
Verbal Autopsy report texts are handwritten reports on paper, currently
stored at Kintampo Health Centre in Ghana, and Sammy is currently
working on his PhD at Leeds University, UK; we need a pratcial way to
get the Kintampo data-entry staff to capture the handwritten text
onto computer files. My epxerience with IBM ViaVoice and other
speech-to-text systems is that they require training to individual
speakers to achieve acceptable transcription error-rates; so to use
ViaVoice at Kintampo, eahc data-entry clerk would have to spend time
traininig personal language models within ViaVoice before they could
then dicate reports into ViaVoice; and then they would have to proofread
and correct ViaVoice output to remove transcription errors.
Kintampo management would need to be persuaded tp adopt ViaVoice (or
other speech-to-text software, eg Dragon naturally Speaking) as a
long-term solution to text ranscription.
Given these potential hurdles to overcome, I conclude that for our
current project, given that ViaVoice is not currently in use at
Kintampo, we should stick to current data-entry methods in use there:
typing at a keyboard. At least we can provide software tailored to
keyboarding the Questionnaire format in the Verbal Autopsies.

Do you agree with my reasoning?  I don't disagree with use of
speech-to-text in principle, it's just that I see more problems with its
introduction in this specific case.

Regards,

Eric Atwell, Sammy Danso's PhD supervisor at Leeds University


Eric Atwell, Senior Lecturer, Language research group,
  I-AIBS Institute for Artificial Intelligence and Biological Systems
  School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS
  Leeds LS2 9JT, England.        TEL: 0113-3435430  FAX: 0113-3435468
  WWW: http://www.comp.leeds.ac.uk/arabic
       http://www.comp.leeds.ac.uk/nlp

On Wed, 13 Jul 2011, Ana Julia wrote:

> Dear Samuel
>  
> I have faced something similar,
> and my solution was to read all the reports (because they were handwritten)
> to my IBM Via Voice program.  I couldn't think about any other better
> strategy by the time... let's see if the colleagues have any better
> solutions
>  
> regards,
>  
> Ana Julia Perrotti-Garcia
> Scientia Vinces Serv. Trad. Ltda
> Translators of Dental and Medical Texts
> Italiano > Español > Português <> English
> Proficiency in English (CPE) University of Cambridge UK
> Visit our webpage at www.scientiavinces.com/ana/
> São Paulo, Brazil
>  
> ----- Original Message -----
>       From: Samuel Danso
> To: 'corpora'
> Sent: Wednesday, July 13, 2011 11:55 AM
> Subject: [Corpora-List] Methodology for capturing corpus from paper
> tocomputer
> 
> Dear All
> 
> Please advise on methodology for capturing paper forms into a computer
> corpus.
> 
>  
> 
> My research involves a collection of 10,000 Verbal Autopsy interviews
> of mother or close relative of deceased, currently on paper forms. How
> should I have these typed onto PC? - double entry by two independent
> clerks is twice the cost of single entry (with checking by managers),
> is it really necessary?
> 
>  
> 
> Sammy Danso,
> 
> Leeds University, UK and Kintampo Health Centre, Ghana
> 
>  
> 
>  
> 
> ____________________________________________________________________________
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
> 
>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list