[Linganth] [EXT] Re: Recommendations for tools transcribing and analyzing large amounts of data
Roth-Gordon, Jen - (jenrothg)
jenrothg at arizona.edu
Thu Apr 9 19:57:53 UTC 2026
Re: "I'm curious, how do you end up with so much data without first thinking
about how you will handle it?"
Funny! I think many of us find ourselves swimming in data and notes that would fill rooms if printed out (especially for long-term projects). While overwhelming, that would be my definition of a successful research project!
Sending solidarity (in lieu of concrete tech suggestions),
jen
Jennifer Roth-Gordon
Associate Professor Emerita
School of Anthropology
University of Arizona
Tucson, AZ 85721-0030
________________________________
From: Linganth <linganth-bounces at listserv.linguistlist.org> on behalf of Jocelyn Aznar <contact at jocelynaznar.eu>
Sent: Thursday, April 9, 2026 9:28 PM
To: linganth at listserv.linguistlist.org <linganth at listserv.linguistlist.org>
Subject: [EXT] Re: [Linganth] Recommendations for tools transcribing and analyzing large amounts of data
External Email
Hi everyone,
I'm curious, how do you end up with so much data without first thinking
about how you will handle it?
As you are within an English department, I assume you work with English?
Do you have some budget? What kind of annotation do you need? which
format? how do you do your analysis? using CSV files? XML? should the
data be reusable by other researchers? meant for being archived? FAIR? etc.
Using online AI tools is probably not ethical, as you have no way to
know what will the companies do with the data and what the people you
recorded said... If you have a recent computer, some budget or access to
University servers, you can use for instance Whisper and a model from
Mistral (like the 7B) to do some annotations automatically. With
languages like English, French and co, it works quite well. But that
requires some scripting.
Best,
Jocelyn
Le 09/04/2026 à 21:13, Nathan Straub 曹內森 a écrit :
> Hi Dominika,
>
> I use Vook.ai (an AI-based subscription service) for rapid automatic
> transcription of English. (It also does Spanish, French, Italian,
> Portuguese, and German.) You would likely have to sort out overlaps and
> speaker labels on you own after that.
>
> For field recordings, I liked using SIL's Saymore software, because it
> provided a place to store recordings and break up a recording into short
> breath groups and listen again and again with slow speech and type up
> rough transcriptions, and then I could port the vernacular and free
> translation lines into FLEx.
>
> Which languages are you working with?
>
> Nathan
>
> We are sent into this world for some end. It is our duty to discover by
> close study what this end is & when we once discover it to pursue it
> with unconquerable perseverance.
> JQA at age 12 to his brother Charles (June 1778)
>
> On Thu, Apr 9, 2026, 12:02 Dominika Baran, Ph.D.
> <dominika.baran at duke.edu <mailto:dominika.baran at duke.edu>> wrote:
>
> Dear Colleagues,
>
> I am looking for recommendations of your favorite tool(s), at the
> moment, for processing large amounts of recorded spoken & written
> conversational data (informal interviews, free conversations), for
> both transcription and coding & analysis.
>
> I have about 100 hours of digitally recorded conversations,
> including those among multiple speakers, with lots of simultaneous
> speech, two conversations going on at once, overlap, and code-
> switching (mostly bilingual, occasionally trilingual). I also have
> 13 years of written group chat conversations, which don’t need
> transcribing but it is over 300,000 words. I am looking for
> suggestions for software, online or otherwise, for both
> transcription (which is tricky because of the multilingual and
> overlapping conversations) and, more importantly, organization,
> coding, and analysis. It has been a while since I have dealt with
> THIS much data and I am sure there is a lot out there that I don’t
> know about - all and any suggestions of what has worked for folks
> are very much appreciated!
>
> Best,
> Dominika
>
>
> Dominika M. Baran
>
> Associate Professor
>
> English Department
>
> Duke University
>
> Allen Building 303
>
> Durham, NC 27708
>
> Pronouns: she/her/hers
>
> _______________________________________________
> Linganth mailing list
> Linganth at listserv.linguistlist.org
> <mailto:Linganth at listserv.linguistlist.org>
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/linganth<https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/linganth>
> <https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/linganth<https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/linganth>>
>
>
> _______________________________________________
> Linganth mailing list
> Linganth at listserv.linguistlist.org
> https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/linganth<https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/linganth>
_______________________________________________
Linganth mailing list
Linganth at listserv.linguistlist.org
https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/linganth<https://listserv.linguistlist.org/cgi-bin/mailman/listinfo/linganth>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/linganth/attachments/20260409/ff8f2919/attachment-0001.htm>
More information about the Linganth
mailing list