Yemen: Language testing: Different facets and parameters

Harold F. Schiffman haroldfs at
Fri May 18 14:04:10 UTC 2007

Language testing: Different facets and parameters

By Dr. Ayid Sharyan Assistant professor, Faculty of Education, Sana'a
University ayids at

Many Students all over complain about exams and their results. Both
teachers and students pass comments on exam papers: not valid, unreliable,
not objective, deals with one part of the curriculum, etc. The university
of Sana'a for this formed a team this semester to check the university
exams of the first term of 2004-2005. The University of Science and
Technology held a series of workshops with the help of Professor Mahmoud
Aukasjha, a visiting professor from Cairo who is an expert in measurement
and testing. I participated in this activity and being a member of the
team of experts entrusted with the task of evaluating the exams of Sana'a
University, I thought it worthwhile to discuss this topic and share ideas
with a wider readership to initiate a healthy dialogue and create
awareness about the issues involved in test construction, in the context
of approaching exams.

The word test makes learners nervous; teachers do not feel happy either.
But can we measure the progress of learners without a test? Not only is
learners achievement checked through the mechanism of testing ,but also
life at large is full of situations where we have to choose among
alternatives and make a decision: choosing a life partner, a major at the
university, a job, a place to stay, a friend, a political party, the way
to dress, speak, eat, etc. Job interviews, as a form of test, are to
select new employees. To admit new entrants to join the Department of
English, one cannot do without tests. What does a test actually seek to

Test measures the ability, knowledge, or performance of a candidate. Test
methods in EFL situation vary from alternative response item (yes, no),
fixed response item or closed-ended response (as choose a, b, c, or d), to
free response item or open-ended response, etc. These tests examine the
English language skills such as listening, speaking, reading, and writing
and sub-skills like pronunciation, intonation, stress, accuracy, fluency
,literary appreciation, grammar, vocabulary and so on.

Since exams go hand in hand with any learning process, it is normal that
students who take a course have to appear for a test. Preliminary tests
help to place candidates in a certain level or diagnose their shortcomings
to overcome them or think of remedial teaching. From time to time teachers
need to evaluate what was covered to see the progress of learning.
Performance of candidates is measured periodically by formative tests that
are indispensable for successful learning. At the end of the course,
teachers need to take a decision about the level of attainment of
learners. Summative evaluation is crucial here to culminate the
achievement and measure the gains. Students evaluation assists evaluating
the whole program: input, and processing and output. Evaluation of this
sort is possible if it makes use of varied types of tests such as
progress, achievement, proficiency, placement, aptitude, diagnostic tests
and so on.

But does it mean a pen-and-paper test is the only means of evaluating EFL
learners? What about interview, observation cards (as questionnaires on
Likert scale or the extended technique of Thurostone), portfolio, progress
reports, research projects or reports by students goals table, checklists,
etc.? Unfortunately, teachers, many a time, are oblivious of measurements
like these ones. For this, traditional testers rely heavily on
pen-and-paper tests that dominate the educational arena.

But why all this fuss about testing? A test item is the first building
block in the whole national education. In evaluating the academic
curricula, grades in a teachers book means a lot for the national
progress, sometimes more than a standardised test as GRE or TOFEL. Some
voices now demand some kind of standardized comprehensive test to check an
output that attains the minimum requirement on international standards.
But this is not the need of the hour. What is needed now is to better the
teachers exams to obtain a precise measurement so as to ensure the quality
of education. Assessment of education takes off from the departure point
of such exams. Program evaluation or curriculum development is a failure
unless it takes into account testing as its base. If such an importance is
assigned to testing, one wonders what to test: knowledge, cognitive
skills, practical skills, transferable skills or all? Since testing is the
means to take a decision, test constructors differ in their opinions about
what to test: linguistic competence or performance. A test designer thinks
of a range of levels of knowledge (e.g. memory, comprehension,
application, analysis, synthesis, or evaluation) when constructing a test.
Other factors that are equally significant are things like
comprehensiveness, variety, test format, test organization, validity,
reliability, objectively and proper layout. Other characteristics such as
authenticity, interactiveness, impact, and practicality of the test are
some of the very necessary test requirements.

Since the learner is going to be encapsulated in one number (i.e. mark),
test designers are compelled to be fair and objective in issuing their
resolutions that have the potential to spell the future of a test-taker.
An exam should not only elicit knowledge but it should add something new
to the learning process.

How can we check that exams are doing what they are supposed to do? Mark
registers or mark sheets reveal discrepancies. The entry point then is the
control sheet with its frequency tables, percentages of failures and
correlation among all courses. A telling evaluation is not possible
without taking a sample of the learners scripts to check the answer sheets
and compare it with the attained grades. Examples of the learners
assignments, research projects, portfolio, disclose the exact level of the
exam takers and the processing of input in a program. This is to reveal
what happens in terms of processing to find out if the test matches the
minimum requirements or not. Evaluating exam means assessing the
educational system and reporting its pros and cons. Curriculum (intended,
implemented or attained) is seen in the light of exams evaluation. The
type of achievement tells clearly whether the intended curriculum has been
achieved or the attained curriculum is something totally different. To
judge the re liability of exams accurately one needs to bear in mind some

An important criterion of exams is variety of questions to measure the
level of students and gains so as to measure up the notational level of
the expected outcomes. Variety of exams provides valuable feedback on the
match or mismatch with the intended curriculum that was chalked out by the
educational planners and policy makers. Accuracy of tests is of paramount
importance to screen the implemented curriculum and find out the exact
attained curriculum. Comprehensiveness of exam is a feature that shows
what the teacher has covered in his teaching. Since a test is a sort of
document that reflects the level of the teacher and the level of students,
it is bound to have some face validity to tell that it measures certain
prescribed levels of knowledge. Test constructors, in terms of content,
need to strike a balance between performance, skill, knowledge, and
mastery of rules.

Major types of exams

An essay-type exam is easy to prepare but difficult to mark. They help to
gauge the students higher levels of thinking: analyzing, organizing and
discussing ideas. But its problems are many. It takes a lot of time to
correct them with no objectivity. Marks may be influenced to a great
extent by the subjective impression of the examiner. It is difficult to
cover all goals of the course. This type of exam can be improved by
carefully delimiting the aim of the question. The phrasing of the question
can pinpoint exactly what is required; examples are:

1. Compare Blakes London with Wordsworths London from the point of time
and place in the two poems;

2. Give the reasons that led Pip in Great Expectations to believe that
Miss Havisham was his benefactor;

3. By looking at the invocation in Paradise Lost, differentiate between
the fall of Man in Christianity and Islam by referring to the story of
Satan, Adam and Eve.

Essay-type questions can be improved also by deciding the level of
knowledge that the question measures. Questions that require long answers
need to be avoided. Prior thinking about the time, model answer, rules of
marking minimize flaws in these exams. Avoiding optional questions is what
many test developers stress to expose testees to the same experience to be
fair in evaluation.

The second type that is commonly used is the objective exam with all its
varieties: multiple choice, true/ false, filling in the blanks, matching,
rearrangement, etc. This type is known for being easy to correct. It is
objective in terms of marking. Its validity and reliability tends to be
higher than the essay-type questions. It is more comprehensive and it
allows for different levels of knowledge at the same time. Its minus
points are that it takes a lot of time to prepare; some testees may guess
and get marks and some find it easy to cheat. In addition to its cost
sometimes, it does not allow for learners to expose their ability of
writing, organizing ideas to show their opinions.

To sum up, there is no ideal way in language testing but a combination of
both essay-type and objective test is more effective and more practical.
Such a method allows for variety, comprehensiveness and the ability to
measure all levels of knowledge in addition to some abilities of
organizing ideas, expressing ones views that become clear in writing, for
instance. Internal assessment may include some other measurements as
observations cards, interviews, and portfolios as well as research
projects. Talking together both in-term and end-term exams give a fair
idea about the testee for teachers, course developers, curriculum
designers, and program evaluators. This is what gives importance to
language testing as an important area of research nowadays.


N.b.: Listing on the lgpolicy-list is merely intended as a service to its members
and implies neither approval, confirmation nor agreement by the owner or sponsor of
the list as to the veracity of a message's contents. Members who disagree with a
message are encouraged to post a rebuttal.


More information about the Lgpolicy-list mailing list