[Corpora-List] Research Internships at Educational Testing Service (Summer 2012)
Nitin Madnani
nmadnani at gmail.com
Mon Nov 14 15:54:44 UTC 2011
Hi Folks,
The Educational Testing Service (ETS) has just released an
announcement about its 2012 Summer Internship Program in Research. The
areas of interest for these programs encompass computational
linguistics and speech processing in addition to educational
measurement and psychometrics. The NLP and Speech group at ETS is a
vibrant, growing group which consists of thirty scientists and
engineers. As a nonprofit organization engaged in the design and
delivery of educational services, ETS has access to exceptional corpus
resources which offer unique opportunities for NLP and Speech
research.
The specific NLP & Speech summer internship projects along with brief
descriptions are listed below. To see more detailed descriptions and
to apply for the internships, please register at
http://www.ets.org/research/fellowships/summer/
Although specific requirements vary by project, all applicants are
expected to have strong programming background in the context of NLP
and Speech applications and experience with machine learning. Specific
project requirements are available as part of the detailed project
descriptions at the application web page mentioned above.
1) Answer Typing for c-rater (12-week internship)ETS has large
collections of short answer responses that have been scored by
c-rater, but these collections would be more useful for assessment
designers and research scientists if they were indexed according to
task-based criteria, rather than chronologically. The project builds
on work in foreign language learning by Meurers, Ott and Ziai, with
the goal of providing, for ETS short answer data, a corpus access
mechanism analogous to the WELCOME system developed in Tubingen.
2) Automatically Evaluating the Signaling of Discourse Relations in
Short Answer Responses (12-week internship)ETS has large collections
of short answer responses that have been scored by c-rater. C-rater is
designed to score responses on the basis of analytic rubrics that
encode the presence or absence of particular concepts. It is not yet
sensitive to aspects of the student response that are driven by the
relationships between the detected concepts. Particularly relevant
relationships are cause-effect, premise-consequence and temporal
ordering of events and actions. The intern will carry out an
exploratory project using ETS short answer data, using an adapted
version of the random-walk summarization methods of (Kok and Brockett,
2010) as well as the ideas of (Oberlander and Brew,2000) on stochastic
natural language generation.
3) Applying Speech Enhancement and Robust Speech Processing Technology
on Speech Assessment (12-week internship)In a large-scale spoken
English test, such as ETS’s TOEFL iBT, responses with poor audio
quality are a critical issue for not only the human-scoring process
but also for newly emerging automated scoring technologies. For human
raters, noisy speech files make a fair and accurate rating more
difficult. For automated assessment, noises largely degrade the
performance of automatic speech recognition (ASR), a core module of
automated speech assessment systems. This study aims at finding
technical solutions to cope with poor audio quality found from
test-takers' responses.
4) Using Linguistic Features for Automated Feedback about Non-native
Pronunciation and Intonation (12-week internship)This project will
investigate the use of linguistic features to improve our capabilities
for automated assessment of non-native speech, with a focus on
providing constructive feedback to language learners regarding
pronunciation and intonation. In particular, this project will use
speech recognition and speech processing technology to develop
linguistically relevant features for the delivery aspect of the
construct for non-native constrained speech (such as recited speech
and repeated speech). There is a great need for this type of research,
since many features which are currently used for automated assessment
of non-native speech are not easy for test takers to interpret, and,
thus, do not provide learners with useful feedback about specific
linguistic areas they should improve.
5) Applying Very Large Language Models to Lexical and Semantic
Analysis of Text (12-week internship)This project will focus on
applying very large distributional language models and very large
n-gram models (both on the scale of billions of words), to some hot
issues in natural language processing. The intern will work on one
specific task from the following list: 1. lexical substitution or
disambiguation, 2. automatic spelling correction, 3. improving word
recognition rate in ASR, 4. automatic estimation of lexical cohesion,
5. detection of collocation errors. Prerequisites: strong background
in NLP, familiarity with language modeling and statistical measures of
word association, some practical knowledge of programming. The goal of
the project is to produce methodology and algorithms (with a view for
publication) as well as actual working code modules. The resources and
methodology will be useful for automated text-scoring engines at ETS.
6) Verifying the Factuality of Statements (12-week internship)Work on
this project is related to automatic verification of the factuality of
statements made in student essays. The goal is to improve e-rater, an
automated essay scoring system, by rewarding essays containing
factually correct information, especially in student-provided
examples. Based on a large database of statements extracted from the
web, we have a baseline system that estimates the amount and quality
of support a student's statement has in the database. We will be
looking for ways to improve the performance of the baseline system.
Potential questions to consider include but are not limited to
handling negative statements and detection of controversial
statements.
7) Using Paraphrase Generation for Improving Educational Assessments
(12-week internship)Automatic generation of paraphrases has received a
lot of attention recently both as a stand-alone task and in the
context of supporting other NLP tasks such as statistical machine
translation and information retrieval. The NLP group at ETS is working
on several different ways to both advance the state of the art in
paraphrase generation and applying these advanced techniques to
improve existing ETS products. This internship affords several
research avenues in support of this work: (1) investigating the use of
paraphrase generation for automated reference answer generation in the
context of knowledge-based short-answer tests, (2) exploring the use
of discourse coherence to guide paraphrase generation of
supra-sentential textual units, and (3) building a tool to allow
exploration and comparison of pivot-based paraphrase collections.
8) Using NLP to Develop ELL Grammatical Error Detection Systems
(12-week internship)One of the biggest challenges facing non-native
speakers of English is learning the correct usage of prepositions and
determiners. Examples of errors include: “They arrived to the town"
(incorrect preposition for that context) and “I studied very hard for
exam" (missing determiner in front of "exam"). This project involves
the task of detecting such errors in learner essays to provide useful
feedback to the writer. Currently, ETS is developing a tool that uses
lexical and syntactic features to detect common ELL grammatical errors
such as incorrect determiners or prepositions. The main aim of the
project to be undertaken by the intern is to investigate more complex
methods and features to improve our current state-of-the-art.
Possible research avenues include: 1) developing algorithms to detect
other errors such as incorrect verb forms, 2) tailoring a parser for
use on ungrammatical text, 3) word sense disambiguation or semantic
role labeling, 4) automatically extracting data from the web and large
corpora to aid in development of error detection models, and 5) using
Machine Translation/Paraphrase techniques to rewrite an error-filled
sentence in a fluent form.
9) Detecting Plagiarized Speech Responses (12-week internship)This
summer intern project aims to address the problem of plagiarism in
spoken responses in tests of English. We will develop an automated
plagiarism detection method as a part of SpeechRater, the
ETS-developed automated scoring engine for spoken responses. To
address this problem, we will try different similarity measures to
calculate the similarity between a new response and previously
acquired materials and responses, such as vector space model (VSM),
WordNet similarity, latent semantic analysis (LSA), etc. The responses
with higher similarity will be flagged as "plagiarism" for human
investigation.
10) Locating and Scoring of Content in Brief Spoken Responses in
English Language Tests (12-week internship)Previous work on automated
scoring of unpredictable speech has mostly focused on aspects of
fluency, pronunciation and prosody, but little research has been done
related to the content accuracy of a spoken response. The goal of this
internship project is to explore methods for locating, identifying and
analyzing content in short spoken responses to items of two different
language tests.
Thanks,Nitin Madnani
Research Scientist,
Text, Language and Computation,
Educational Testing Service,
Princeton
http://www.desilinguist.org
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list