From macw at andrew.cmu.edu Wed Jul 1 10:58:32 2020 From: macw at andrew.cmu.edu (Brian MacWhinney) Date: Wed, 1 Jul 2020 10:58:32 -0400 Subject: TalkBankDB Message-ID: (apologies for cross-postings) Dear Phon, John Kowalski has developed a database search engine for accessing data in TalkBank, it is called TalkBankDB and you can get to it from https://talkbank.org/DB . It allows users to search any one of the 14 TalkBank databases, using a variety of selectors. Currently, it is fairly well-tuned for searches in the child language databases (CHILDES, PhonBank, FluencyBank, ASDBank), using things like gender, age, activity type, language, study design, and media type. However, it is not tuned for any of the details of phonological structure. The structures that it retrieves are in CHAT format and there is no support (yet) for analysis of phonological detail. The short Manual describes the ways to create searches and provides a few sample exercises to help you learn how to use the system. After issuing a search pattern, the system responds quickly with matches and gives you seven tabs for further operations: 1. The Transcripts tab lists all the transcripts matching your search. You can click on any one and open up the transcript directly in the TalkBank Browser. 2. The Participants tab lists the participants in each matching transcript. 3. The Utterances tab lists all the utterances in all matching transcripts. So this can be a very big file, but it still loads and downloads very quickly. 4. The Tokens tab can be really big. It lists each word in the matching transcripts, one word on each line with information about the surface, the lemma, the part of speech, the name of the transcript, and the speaker. 5. The Token Types tab groups the tokens by type. 6. The Visualizations tab allows you to quickly graph words by frequency across ages. We will add other types of visualizations eventually. 7. The CQL tab allows you to create Corpus Query Language searches by word, lemma, and part of speech. It does not yet support the OR operator. The results from any of these 7 tabs can be download in spreadsheet format and then opened in Excel, R, or other analysis programs. There is also an R API, but it has not yet been "packaged" or fully tested. This new facility could be particularly useful for classes this Fall that need to rely on web resources as a replacement for in-person experiments. I hope people can give it a try and then give us feedback about problems, suggestions for improvements, and ways to use the system. Best regards, --Brian MacWhinney Teresa Heinz Professor of Cognitive Psychology, Computational Linguistics, and Modern Languages, CMU -- You received this message because you are subscribed to the Google Groups "Phon" group. To unsubscribe from this group and stop receiving emails from it, send an email to phon+unsubscribe at googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/phon/EDF0EFD2-4BD9-49B4-9A7D-513A80BFF8E6%40andrew.cmu.edu. -------------- next part -------------- An HTML attachment was scrubbed... URL: From macw at andrew.cmu.edu Wed Jul 1 14:58:32 2020 From: macw at andrew.cmu.edu (Brian MacWhinney) Date: Wed, 1 Jul 2020 10:58:32 -0400 Subject: TalkBankDB Message-ID: (apologies for cross-postings) Dear Phon, John Kowalski has developed a database search engine for accessing data in TalkBank, it is called TalkBankDB and you can get to it from https://talkbank.org/DB . It allows users to search any one of the 14 TalkBank databases, using a variety of selectors. Currently, it is fairly well-tuned for searches in the child language databases (CHILDES, PhonBank, FluencyBank, ASDBank), using things like gender, age, activity type, language, study design, and media type. However, it is not tuned for any of the details of phonological structure. The structures that it retrieves are in CHAT format and there is no support (yet) for analysis of phonological detail. The short Manual describes the ways to create searches and provides a few sample exercises to help you learn how to use the system. After issuing a search pattern, the system responds quickly with matches and gives you seven tabs for further operations: 1. The Transcripts tab lists all the transcripts matching your search. You can click on any one and open up the transcript directly in the TalkBank Browser. 2. The Participants tab lists the participants in each matching transcript. 3. The Utterances tab lists all the utterances in all matching transcripts. So this can be a very big file, but it still loads and downloads very quickly. 4. The Tokens tab can be really big. It lists each word in the matching transcripts, one word on each line with information about the surface, the lemma, the part of speech, the name of the transcript, and the speaker. 5. The Token Types tab groups the tokens by type. 6. The Visualizations tab allows you to quickly graph words by frequency across ages. We will add other types of visualizations eventually. 7. The CQL tab allows you to create Corpus Query Language searches by word, lemma, and part of speech. It does not yet support the OR operator. The results from any of these 7 tabs can be download in spreadsheet format and then opened in Excel, R, or other analysis programs. There is also an R API, but it has not yet been "packaged" or fully tested. This new facility could be particularly useful for classes this Fall that need to rely on web resources as a replacement for in-person experiments. I hope people can give it a try and then give us feedback about problems, suggestions for improvements, and ways to use the system. Best regards, --Brian MacWhinney Teresa Heinz Professor of Cognitive Psychology, Computational Linguistics, and Modern Languages, CMU -- You received this message because you are subscribed to the Google Groups "Phon" group. To unsubscribe from this group and stop receiving emails from it, send an email to phon+unsubscribe at googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/phon/EDF0EFD2-4BD9-49B4-9A7D-513A80BFF8E6%40andrew.cmu.edu. -------------- next part -------------- An HTML attachment was scrubbed... URL: