[Corpora-List] Research Internships at Educational Testing Service [CORRECTED]

Nitin Madnani nmadnani at gmail.com
Mon Nov 14 16:43:47 UTC 2011


[Apologies for the incorrect formatting in the previous announcement]


Hi Folks,

The Educational Testing Service (ETS) has just released an announcement
about its 2012 Summer Internship Program in Research. The areas of interest
for these programs encompass computational linguistics and speech
processing in addition to educational measurement and psychometrics. The
NLP and Speech group at ETS is a vibrant, growing group which consists of
thirty scientists and engineers.  As a nonprofit organization engaged in
the design and delivery of educational services, ETS has access to
exceptional corpus
resources which offer unique opportunities for NLP and Speech research.

The specific NLP & Speech summer internship projects along with brief
descriptions are listed below. To see more detailed descriptions and to
apply for the internships, please register at
http://www.ets.org/research/fellowships/summer/

Although specific requirements vary by project, all applicants are expected
to have strong programming background in the context of NLP and Speech
applications and experience with machine learning. Specific project
requirements are available as part of the detailed project descriptions at
the application web page mentioned above.

1) Answer Typing for c-rater (12-week internship)
ETS has large collections of short answer responses that have been scored
by c-rater, but these collections would be more useful for assessment
designers and research scientists if they were indexed according to
task-based criteria, rather than chronologically. The project builds on
work in foreign language learning by Meurers, Ott and Ziai, with the goal
of providing, for ETS short answer data, a corpus access mechanism
analogous to the WELCOME system developed in Tubingen.

2) Automatically Evaluating the Signaling of Discourse Relations in Short
Answer Responses (12-week internship)
ETS has large collections of short answer responses that have been scored
by c-rater. C-rater is designed to score responses on the basis of analytic
rubrics that encode the presence or absence of particular concepts. It is
not yet sensitive to aspects of the student response that are driven by the
relationships between the detected concepts. Particularly relevant
relationships are cause-effect, premise-consequence and temporalordering of
events and actions. The intern will carry out an exploratory project using
ETS short answer data, using an adapted version of the random-walk
summarization methods of (Kok and Brockett, 2010) as well as the ideas of
(Oberlander and Brew,2000) on stochastic natural language generation.

3) Applying Speech Enhancement and Robust Speech Processing Technology on
Speech Assessment (12-week internship)
In a large-scale spoken English test, such as ETS’s TOEFL iBT, responses
with poor audio quality are a critical issue for not only the human-scoring
process but also for newly emerging automated scoring technologies. For
human raters, noisy speech files make a fair and accurate rating more
difficult. For automated assessment, noises largely degrade the performance
of automatic speech recognition (ASR), a core module of automated speech
assessment systems. This study aims at finding technical solutions to cope
with poor audio quality found from test-takers' responses.

4) Using Linguistic Features for Automated Feedback about Non-native
Pronunciation and Intonation (12-week internship)
This project will investigate the use of linguistic features to improve our
capabilities for automated assessment of non-native speech, with a focus on
providing constructive feedback to language learners regarding
pronunciation and intonation.  In particular, this project will use speech
recognition and speech processing technology to develop linguistically
relevant features for the delivery aspect of the construct for non-native
constrained speech (such as recited speech and repeated speech). There is a
great need for this type of research, since many features which are
currently used for automated assessment of non-native speech are not easy
for test takers to interpret, and, thus, do not provide learners with
useful feedback about specific linguistic areas they should improve.

5) Applying Very Large Language Models to Lexical and Semantic Analysis of
Text (12-week internship)
This project will focus on applying very large distributional language
models and very large n-gram models (both on the scale of billions of
words), to some hot issues in natural language processing. The intern will
work on one specific task from the following list: 1. lexical substitution
or disambiguation, 2. automatic spelling correction, 3. improving word
recognition rate in ASR, 4. automatic estimation of lexical cohesion, 5.
detection of collocation errors. Prerequisites: strong background in NLP,
familiarity with language modeling and statistical measures of word
association, some practical knowledge of programming. The goal of the
project is to produce methodology and algorithms (with a view for
publication) as well as actual working code modules. The resources and
methodology will be useful for automated text-scoring engines at ETS.

6) Verifying the Factuality of Statements (12-week internship)
Work on this project is related to automatic verification of the factuality
of statements made in student essays. The goal is to improve e-rater, an
automated essay scoring system, by rewarding essays containing factually
correct information, especially in student-provided examples. Based on a
large database of statements extracted from the web, we have a baseline
system that estimates the amount and quality of support a student's
statement has in the database. We will be looking for ways to improve the
performance of the baseline system. Potential questions to consider include
but are not limited to handling negative statements and detection of
controversial statements.

7) Using Paraphrase Generation for Improving Educational Assessments
(12-week internship)
Automatic generation of paraphrases has received a lot of attention
recently both as a stand-alone task and in the context of supporting other
NLP tasks such as statistical machine translation and information
retrieval. The NLP group at ETS is working on several different ways to
both advance the state of the art in paraphrase generation and applying
these advanced techniques to improve existing ETS products. This internship
affords several research avenues in support of this work: (1) investigating
the use of paraphrase generation for automated reference answer generation
in the context of knowledge-based short-answer tests, (2) exploring the use
of discourse coherence to guide paraphrase generation of supra-sentential
textual units, and (3) building a tool to allow exploration and comparison
of pivot-based paraphrase collections.

8) Using NLP to Develop ELL Grammatical Error Detection Systems (12-week
internship)
One of the biggest challenges facing non-native speakers of English is
learning the correct usage of prepositions and determiners.  Examples of
errors include: “They arrived to the town"(incorrect preposition for that
context) and “I studied very hard for exam" (missing determiner in front of
"exam").  This project involves the task of detecting such errors in
learner essays to provide useful feedback to the writer.  Currently, ETS is
developing a tool that uses lexical and syntactic features to detect common
ELL grammatical errors such as incorrect determiners or prepositions.  The
main aim of the project to be undertaken by the intern is to investigate
more complex methods and features to improve our current state-of-the-art.
Possible research avenues include: 1) developing algorithms to detect other
errors such as incorrect verb forms, 2) tailoring a parser for use on
ungrammatical text, 3) word sense disambiguation or semantic role labeling,
4) automatically extracting data from the web and large corpora to aid in
development of error detection models, and 5) using Machine
Translation/Paraphrase techniques to rewrite an error-filled sentence in a
fluent form.

9) Detecting Plagiarized Speech Responses (12-week internship)
This summer intern project aims to address the problem of plagiarism in
spoken responses in tests of English. We will develop an automated
plagiarism detection method as a part of SpeechRater, the ETS-developed
automated scoring engine for spoken responses.  To address this problem, we
will try different similarity measures to calculate the similarity between
a new response and previously acquired materials and responses, such as
vector space model (VSM), WordNet similarity, latent semantic analysis
(LSA), etc. The responses with higher similarity will be flagged as
"plagiarism" for human investigation.

10) Locating and Scoring of Content in Brief Spoken Responses in English
Language Tests (12-week internship)
Previous work on automated scoring of unpredictable speech has mostly
focused on aspects of fluency, pronunciation and prosody, but little
research has been done related to the content accuracy of a spoken
response. The goal of this internship project is to explore methods for
locating, identifying and analyzing content in short spoken responses to
items of two different language tests.

Thanks,

Nitin Madnani
Research Scientist,
Text, Language and Computation,
Educational Testing Service,
Princeton
http://www.desilinguist.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111114/e617130d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list