mercredi 7 mai 2008
For decades traditional methods of testing have been criticized for saying relatively little reliably about students’ ability as well as causing anxiety, which can negatively affect students’ recall of learned information.
TEST EVALUATION by Lawrence M. Rudner, ERIC/AE 12/93
You should gather the information you need to evaluate a test.
1) Be sure you have a good idea what you want a test to
measure and how you are going to use it.
2) Get a specimen set from the publisher. Be sure it
includes technical documentation.
3) Look at reviews prepared by others. The Buros and Pro-Ed
Test Locators should help you identify some existing
reviews. The MMY also contains references in the
professional literature concerning cited tests. The ERIC
database can also be used to identify existing reviews.
4) Read the materials and determine for yourself whether the
publisher has made a compelling case that the test is
valid and appropriate for your intended use.
There are several guidelines to help you evaluate tests.
The Code of Fair Testing Practices, which is available
through this gopher site.
o American Psychological Association (1986) Standards for
Educational and Psychological Tests and Manuals. Washington,
o Equal Employment Opportunity Commission (1978) Uniform
Guidelines on Employee Selection Procedures, Federal
Register 43, 116, 38295 - 38309.
o Society for Industrial and Organizational Psychology (1987)
Principles for the validation and use of personnel selection
procedures, Third edition, College Park, MD: author.
In this brief, we identify key standards from the Standards for
Educational and Psychological Testing established by the American
Educational Research Association, the American Psychological
Association, and the National Council on Measurement in
Education. We describe these standards and questions you may want
to raise to evaluate whether the standard has been met.
We discuss standards concerning
A. Test coverage and use
B. Appropriate samples for test validation and norming
D. Predictive validity
E. Content validity
F. Construct validity
G. Test administration
H. Test reporting
I. Test and item bias
A. Test coverage and use
There must be a clear statement of recommended uses and a
description of the population for which the test is intended.
The principal question to be asked in evaluating a test is
whether it is appropriate for your intended purposes and your
students. The use intended by the test developer must be
justified by the publisher on technical grounds. You then need
to evaluate your intended use against the publisher's intended
use and the characteristics of the test.
Questions to ask are:
1. What are the intended uses of the test? What types of
interpretations does the publisher feel are appropriate?
Are foreseeable inappropriate applications identified?
2. Who is the test designed for? What is the basis for
considering whether the test is applicable to your students?
B. Appropriate samples for test validation and norming.
The samples used for test validation and norming must be of
adequate size and must be sufficiently representative to
substantiate validity statements, to establish appropriate norms,
and to support conclusions regarding the use of the instrument
for the intended purpose.
The individuals in the norming and validation samples should be
representative of the group for which the test is intended in
terms of age, experience and background.
Questions to ask are:
1. How were the samples used in pilot testing, validation and
norming chosen? Are they representative of the population
for which the test is intended? How is this sample related
to the your population of students? Were participation
rates appropriate? Can you draw meaningful comparisons of
your students and these students?
2. Was the number of test-takers large enough to develop stable
estimates with minimal fluctuation due to sampling errors?
Where statements are made concerning subgroups, is the
number of test-takers in each subgroup adequate?
3. Do the difficulty levels of the test and criterion measures
(if any) provide an adequate basis for validating and
norming the instrument? Are there sufficient variations in
4. How recent was the norming?
The test is sufficiently reliable to permit stable estimates of
Fundamental to the evaluation of any instrument is the degree to
which test scores are free from various sources of measurement
error and are consistent from one occasion to another. Sources
of measurement error, which include fatigue, nervousness, content
sampling, answering mistakes, misinterpretation of instructions,
and guessing, will always contribute to an individual's score and
lower the reliability of the test.
Different types of reliability estimates should be used to
estimate the contributions of different sources of measurement
error. Inter-rater reliability coefficients provide estimates of
errors dues to inconsistencies in judgement between raters.
Alternate-form reliability coefficients provide estimates of the
extent to which individuals can be expected to rank the same on
alternate forms of a test. Of primary interest are estimates of
internal consistency which account for error due to content
sampling, usually the largest single component of measurement
Questions to ask are:
1. Have appropriate types of reliability estimates have been
computed? Have appropriate statistics been used to compute
these estimates? (Split half-reliability coefficients, for
example, should not be used with speeded tests as they will
produce artificially high estimates.)
2. What are the reliabilities of the test for different groups
of test-takers? How were they computed?
3. Is the reliability sufficiently high to warrant the use of
the test as a basis for making decisions concerning
D. Predictive validity
The test adequately predicts academic performance.
In terms of an achievement test, predictive validity refers to
the extent to which a test can be appropriately used to draw
inferences regarding achievement. Empirical evidence in support
of predictive validity must include a comparison of performance
on the test being validated against performance on outside
A variety of measures are available as outside criteria. Grades,
class rank, other tests, teacher ratings, and other criteria have
been used. Each of these measures, however, have their own
There are also a variety of ways to demonstrate the relationship
between the test being validated and subsequent performance.
Scatterplots, regression equations, and expectancy tables should
be provided in addition to correlation coefficients.
Questions to ask are:
1. What criterion measure(s) have been used in evaluating
validity? What is the rationale for choosing this measure?
Is this criterion measure appropriate?
2. Is the distribution of scores on the criterion measure
3. What is the basis for the statistics used to demonstrate
4. What is the overall predictive accuracy of the test? How
accurate are predictions for individuals whose scores are
close to cut-points of interest?
E. Content validity
The test measures content of interest.
Content validity refers to the extent to which the test questions
are representative of the skills in the specified domain.
Content validity will often be evaluated by an examination of the
plan and procedures used in test construction. Did the test
development procedure follow a rational approach that ensures
appropriate content? Did the process ensure that the collection
of items would be representative of appropriate skills?
Questions to ask are:
1. Is there a clear statement of the universe of skills
represented by the test? What is the basis for selecting
this set of skills? What research was conducted to
determine desired test content and/or evaluate it once
2. Were the procedures used to generate test content and items
consistent with the test specifications?
3. What was the composition of expert panels used in content
validation? What process was used to elicit their
4. How similar is this content to the content you are
interested in testing?
F. Construct validity
The test measures the right psychological constructs.
Construct validity refers to the extent to which a test measures
a trait derived from research or experience that have been
constructed to explain observable behavior. Intelligence, self-
esteem, and creativity are examples of such psychological traits.
Evidence in support of construct validity can take many forms.
One approach is to demonstrate that the items within a measure
are inter-related and therefore measure a single construct.
Inter-item correlation and factor analysis are often used to
demonstrate relationships among the items.
Another approach is to demonstrate that the test behaves as one
would expect a measure of the construct to behave. One might
expect a measure of creativity to show a greater correlation with
a measure of artistic ability than a measure of scholastic
achievement would show.
Questions to ask are:
1. Is the conceptual framework for each tested construct clear
and well founded? What is the basis for concluding that the
construct is related to the purposes of the test?
2. Does the framework provide a basis for testable hypotheses
concerning the construct? Are these hypotheses supported by
G. Test administration
Detailed and clear instructions outlining appropriate test
administration procedures are provided.
Statements concerning the validity of a test for an intended
purpose and the accuracy of the norms associated with a test can
only generalize to testing situations which replicate the
conditions used to establish validity and obtain normative data.
Test administrators need detailed and clear instructions in order
to replicate these conditions.
All test administration specifications, such as instructions to
test takers, time limits, use of reference materials, use of
calculators, lighting, equipment, assigning seats, monitoring,
room requirements, testing sequence, and time of day, should be
Questions to ask are:
1. Will test administrators understand precisely what is
expected of them?
2. Do the test administration procedures replicate the
conditions under which the test was validated and normed?
Are these procedures standardized?
H. Test reporting
The methods used to report test results, including scaled scores,
subtests results and combined test results, are described fully
along with the rationale for each method.
Test results should be presented in a manner that will help
schools, teachers and students to make decisions that are
consistent with appropriate uses of the test. Help should be
available for interpreting and using the test results.
Questions to ask are:
1. How are test results reported to test-takers? Are they
clear and consistent with the intended use of the test? Are
the scales used in reporting results conducive to proper
2. What materials and resources are available to aid in
interpreting test results?
I. Test and item bias
The test is not biased or offensive with regard to race, sex,
native language, ethnic origin, geographic region or other
Test developers are expected to exhibit a sensitivity to the
demographic characteristics of test-takers, and steps should be
taken during test development, validation, standardization, and
documentation to minimize the influence of cultural factors on
individual test scores. These steps may include the use of
individuals to evaluate items for offensiveness and cultural
dependency, the use of statistics to identify differential item
difficulty, and an examination of predictive validity for
Tests are not expected to yield equivalent mean scores across
population groups. To do so would be to inappropriately assume
that all groups have had the same educational and cultural
experiences. Rather, tests should yield the same scores and
predict the same likelihood of success for individual test-takers
of the same ability, regardless of group membership.
Questions to ask are:
1. Were reviews conducted during the test development and
validation process to minimize possible bias and
offensiveness? How were these reviews conducted? What
criteria were used to evaluate the test specifications
and/or test items? What was the basis for these criteria?
2. Were the items analyzed statistically for possible bias?
What method or methods were used? How were items selected
for inclusion in the final version of the test?
3. Was the test analyzed for differential validity across
groups? How was this analysis conducted? Does the test
predict the same likelihood of success for individuals of
the same ability, regardless of group membership?
4. Was the test analyzed to determine the English language
proficiency required of test-takers? Is the English
proficiency requirement excessive? Should the test be used
with individuals who are not native speakers of English?
Pasted from </seltips.txt>
QUESTION: I'm new to language testing but have taken one language testing course and am very interested in learning more. Can you tell me where I can get more information so I can keep on learning about language testing?
ANSWER: In recent years, more and more people seem to share your interest in language testing, especially in Japan. I would suggest that you start with the following: (a) check out several Internet websites that I will give below, (b), subscribe to one or more language testing journals, (c) join a language testing organization, and (d) read some of the many books that have recently been published.
Check Out Several Internet Websites
A number of Internet websites specialize in language testing issues, but the single best source of information is the "Resources in Language Testing Page" maintained by Glenn Fulcher at http://www.le.ac.uk/education/testing/ltr.html. Fulcher's webpage includes links, reviews, articles, video FAQS, and a searchable database. The links are particularly interesting because they will take you to other websites that he labels as private/commercial websites; personal pages; government websites; ERIC; associations, centers, and councils; universities; ethics and fairness; statistics; tests online; computer based testing; search for a specific test; conferences; awards; ILTA; EALTA; LTRC; and LTEST-L. You may also find "The Language Tester's Guide to Cyberspace" (also from Fulcher) useful at http://www.le.ac.uk/education/testing/ltrfile/cybertxt.html. It provides an excellent general guide to websites listed above.
Bob Godwin-Jones provides other websites on language testing (some of which are unfortunately out of date) in his 2001 article entitled "Emerging technologies: Language testing tools and technologies" [Language Learning & Technology, 5(2), 8-12] available at http://llt.msu.edu/vol5num2/emerging/default.html. His article covers the following topics with many links available within each: computerized testing; internet applications; authoring tools; outlook; and a resource list (including web-based testing resources; organizations and institutions; language tests; sample on-line practice tests; language placement tests on-line; test makers, tools, and templates).
A number of other generic testing and evaluation clearinghouses can also be found on the internet. Check out the following:
Assessment and Evaluation on the Internet ERIC/AE Digest /1996-1/evaluation.htm
Assessment and Evaluation Resources on the NET /assessmentandevalresources.htm
Assessment and Evaluation: Resources on the Internet /Assmntlinks.html
Education Standards and Testing /Education/Standards_and_Testing/
ERIC searches /digests.htm
Subscribe to One or More Language Testing Journals
At the moment there are two primary language testing journals that serve as the gold standard for research in our sub-field. You should definitely consider subscribing to the following two journals at the following websites:
Language Testing /Journals/pages/lan_tes/02655322.htm
Language Assessment Quarterly /shop/tek9.asp?pg=products&specific=1543-4303
[ p. 21 ]
And, don't forget to skim through them when they start to arrive.
Clearly, if you are reading this article, you already know about the JALT Testing and Evaluation SIG newsletter called the SHIKEN, but you may not know that it is regularly available as a member of the JALT Testing and Evaluation SIG or that, after publication, the individual articles are available at /test/pub.htm. Another newsletter that may be interesting to language testers is the ILTA Newsletter at http://www.le.ac.uk/education/testing/ltr.html.
Naturally, you should also keep and eye out for articles on language testing in other mainstream second language journals like: Applied Linguistics, JALT Journal, The Language Teacher, Language Learning, Language Teaching, Language Teaching Research, Modern Language Journal, RELC Journal, Studies in Second Language Acquisition (SSLA), TESOL Quarterly, and so forth.
If you find yourself getting pathologically serious about language testing, you will probably want to become conversant with one or more of the following measurement journals in education and psychology:
Applied Measurement in Education
Applied Psychological Measurement /
Educational and Psychological Measurement /journal.aspx?pid=165
Educational Measurement: Issues and practice /pubs/emip.cfm
European Journal of Psychological Assessment /pubs/emip.cfm
International Journal of Testing /publications.htm
Journal of Educational Measurement /pubs/jem.cfm
NCME Newsletter /pubs/ncmenews.cfm
Join a Language Testing Organization
One way to keep in touch with other language testers is to join a language testing organization. Some readers may be surprised to learn that there are a number of organizations that promote language testing. Premier among these is the International Language Testing Association (ILTA) /. However, other regional organizations may be of particular interest to those language testers who happen to live within those particular regions:
Academic Committee for Research on Language Testing (ACROLT) in Israel http://info.smkb.ac.il/home/home.exe/11571
Association of Language Testers in Europe (ALTE) /
European Association for Language Testing and Assessment (EALTA) /
JALT Testing and Evaluation SIG in Japan /test/
Japan Language Testing Association (JLTA) http://www.avis.ne.jp/~youichi/JLTA.html
Midwest Association of Language Testers http://www.public.iastate.edu/~mwalt/homepage.html
Southern California Association for Language Assessment Research (SCALAR) http://www.studentgroups.ucla.edu/scalar/scalar.html
The East Coast Organization of Language Testers (ECOLT)
[ p. 22 ]
One of the primary benefits of joining these organizations is that they often sponsor conferences and workshops. Such events are a useful way to learn about language testing, but they can also help you begin establishing networks of friends and acquaintances in the field. Check the websites above for more on the various conferences and workshops that are going on near you.
Read Some of the Recently Published Books
The following is probably not a complete list of the language testing books that have been published since 1995. However, they are the recent language testing books that I have on my shelf or have on order, and they will serve as a good starting point for reading up on the field. New language testers should scan through the list for titles or authors that interest them, order the book from the publisher or on , and start reading. More established language testers may want to scan through the list to see if there is anything they have missed. So here is my (fairly complete) list of books on language testing published since 1995:
Alderson, C. J. (2000). Assessing reading. Cambridge: Cambridge University (ISBN: 0521599997).
Alderson, C. J., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University (ISBN: 0521478294).
Allison, D. (1999). Language testing and evaluation: An introductory course. Singapore: Singapore University and World Scientific (ISBN: 9971692260).
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University (ISBN: 0521003288).
Bachman, L. F., & Cohen, A. D. (Eds.). (1998). Interfaces between second language acquisition and language testing research. Cambridge: Cambridge University (ISBN: 0521649633).
Bachman, L. F., & Kunnan, A. J. (2005). Statistical analyses for language assessment workbook and CD ROM. Cambridge: Cambridge University (ISBN: 0521609062).
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University (ISBN: 0194371484).
Bachman, L. F., Davidson, F., Ryan, K., Inn-Chull, C. (1995). Studies in language testing 1: An investigation into the comparability of two tests of English as a foreign language. Cambridge: Cambridge University (ISBN: 0521484677).
Bailey, K. (1997). Learning about language assessment: Dilemmas, decisions, and directions. Washington, DC: International Thomson (ISBN: 0838466885).
Banerjee, J., Clapham, C., Clapham, P., & Wall, D. (1999). ILTA language testing bibliography 1990-1999. Lancaster, UK: Lancaster University (ISBN: 1862200734) (the same material is available at /ILTA_pubs.htm).
Barnwell, D. P. (1996). A history of foreign language testing in the United States from its beginning to the present. Tempe, AZ: Bilingual Review. (ISBN: 0927534592).
Blue, G. M., Milton, J., Saville, J. (Eds.). (2000). Language testing and evaluation 1: Assessing English for academic purposes. Frankfurt: Peter Lang (ISBN: 0820453161).
Brown, H. D. (2003). Language assessment: Principles and classroom practices. New York: Pearson Longman ESL (ISBN: 0130988340).
Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall (ISBN: 0131241575).
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language assessment (New ed.). New York: McGraw-Hill College (ISBN: 0072948361).
Brown, J. D. (Ed.). (1998). New ways of classroom assessment. Alexandria, VA: TESOL (ISBN: 0939791722).
Brown, J. D., & Hudson, T. (2002). Criterion-referenced language testing. Cambridge: Cambridge University (ISBN: 0521000831).
Brown, J. D., & Yamashita, S. O. (Eds.). (1995). Language testing in Japan. Tokyo: Japan Association for Language Teaching (ISBN: 4990037006).
Brown, J.D. (translated into Japanese by M. Wada). (1999). Gengo tesuto no kisochishi. [Basic knowledge of language testing]. Tokyo: Taishukan Shoten (ISBN 4469212261).
Brown, J.D., Hudson, T., Norris, J.M., & Bonk, W. (2002). Investigating second language performance assessments. Honolulu, HI: University of Hawaii (ISBN 0824826337).
[ p. 23 ]
Buck, G. (2001). Assessing listening. Cambridge: Cambridge University (ISBN: 0521666619).
Chalhoub-Deville, M. (2000). Studies in language testing 10: Issues in computer-adaptive testing of reading proficiency. Cambridge: Cambridge University (ISBN: 0521653002).
Chapelle, C., & Douglas, D. (2006). Assessing language through computer technology. Cambridge: Cambridge University (ISBN: 0521549493).
Cheng, L. (2005). Studies in language testing 21: Changing language teaching through language testing: A washback study. Cambridge: Cambridge University (ISBN: 0521544734).
Cheng, L., & Watanabe, Y. (Eds.). (2004). Washback in language testing: Research contexts and methods. Mahwah, NJ: Lawrence Erlbaum Associates (ISBN: 0805839879).
Clapham, C. (1996). Studies in language testing 4: The development of IELTS: A study of the effect of background on reading comprehension. Cambridge: Cambridge University (ISBN: 0521567084).
Clapham, C., & Corson, D. (Eds.). (1997). Encyclopedia of language and education: Volume 7, Language testing and assessment. Dordrecht, NL: Kluwer Academic (ISBN: 0792349342).
Cohen, A. D. (1994). Assessing language ability in the classroom (2nd ed.). Boston: Heinle & Heinle (ISBN: 0838442625).
Coombe, C. A., & Hubley, N. J. (2003). Assessment practices. Alexandria, VA: TESOL (ISBN: 1931185077)
Cumming, A., & Berwick, R. (1996). Validation in language testing. Clevedon, UK: Multilingual Matters (ISBN: 1853592951).
Davidson, F., & Lynch, B. K. (2002). Testcraft: A teacher's guide to writing and using language test specifications. New Haven, CT: Yale University (ISBN: 0300090064).
Davies, A. (forthcoming). Studies in language testing 23: Testing English for academic purposes, 1950-2005. Cambridge: Cambridge University (ISBN: 0521834732).
Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T., & McNamara, T. (1999). Studies in language testing 7: Dictionary of language testing. Cambridge: Cambridge University (ISBN: 0521658764).
Douglas, D. (2000). Assessing language for specific purposes. Cambridge: Cambridge University (ISBN: 0521585430).
Ekbatani, G., & Pierson, H. (Eds.) (2000). Learner-directed assessment in ESL. Mahwah, NJ: Lawrence Erlbaum Associates (ISBN: 0805830677).
Elder, C., Brown, A., Grove, E., Hill, K., Iwashita, N., Lumley, T., McNamara, T., & O'Loughlin, K. (Eds.) (2001). Studies in language testing 11: Experimenting with uncertainty: Essays in honour of Alan Davies. Cambridge: Cambridge University (ISBN: 0521772560).
Fulcher, G. (2003). Testing second language speaking. Cambridge: Cambridge University (ISBN: 0582472709).
Genessee, F., & Upshur, J. A. (1996). Classroom-based evaluation in second language education. Cambridge: Cambridge University (ISBN: 0521566819).
Gottlieb, M. (2006). Assessing English language learners: Bridges from language proficiency to academic achievement. Thousand Oaks, CA: Corwin (ISBN: 0761988882).
Green, A. (1998). Studies in language testing 5: Using verbal protocols in language testing research: A handbook. Cambridge: Cambridge University (ISBN: 0521584132).
Hasselgreen, A. (2005). Studies in language testing 20: Testing the spoken English of young Norwegians: A study of testing validity and the role of 'smallwords' in contributing to pupils' fluency. Cambridge: Cambridge University. (ISBN: 0521544726).
Hawkey, R. (2005). Studies in language testing 16: A modular approach to testing English language skills: The development of the Certificates in English Language Skills (CELS) examinations. Cambridge: Cambridge University (ISBN: 0521013321).
Hawkey, R. (forthcoming). Studies in language testing 24: Impact theory and practice: Studies of the IELTS test and Progetto Lingue 2000. Cambridge: Cambridge University.
Hudson, T., & Brown, J. D. (Eds.). (2001). A focus on language test development: Expanding the language proficiency construct across a variety of tests. Honolulu, HI: University of Hawaii (ISBN 0824823516).
Hudson, T., Detmer, E., & Brown, J. D. (1995). Developing prototypic measures of cross-cultural pragmatics. Honolulu, HI: University of Hawaii (ISBN 082481763X).
Hughes, A. (2002). Testing for language teachers (revised). Cambridge: Cambridge University (ISBN: 0521484952).
Kunnan, A. J. (1996). Studies in language testing 2: Test taker characteristics and performance: A structural modeling approach. Cambridge: Cambridge University (ISBN: 0521484669).
[ p. 24 ]
Kunnan, A. J. (2000). Studies in language testing 9: Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida. Cambridge: Cambridge University (ISBN: 0521658748).
Kunnan, A. J. (Ed.). (1998). Validation in language assessment. Mahwah, NJ: Lawrence Erlbaum Associates (ISBN: 0805827536).
Lazaraton, A. (2002). Studies in language testing 14: A qualitative approach to the validation of oral language tests. Cambridge: Cambridge University (ISBN: 052180227X).
Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University (ISBN: 0521804876).
McKay, P. (2006). Assessing young language learners. Cambridge: Cambridge University (ISBN: 0521601231).
McNamara, T. (1996). Measuring second language performance. London: Longman (ISBN: 0582089077).
McNamara, T. (2000). Language testing. Oxford: Oxford University (ISBN: 0194372227).
Milanovic, M., & Saville, N. (Eds.) (1995). Studies in language testing 3: Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Research Colloquium, Cambridge and Arnhem. Cambridge: Cambridge University (ISBN: 0521484465-0).
Milanovic, M., & Weir, C. J. (Eds.) (2004). Studies in language testing 18: European language testing in a global context: Proceedings of the ALTE Barcelona Conference July 2001. Cambridge: Cambridge University (ISBN: 052182897X).
Norris, J. M., Brown, J. D., Hudson, T., & Yoshioka, J. (1998). Designing second language performance assessments. Honolulu, HI: University of Hawaii (ISBN: 0824821092).
O'Loughlin, K. J. (2001). Studies in language testing 13: The equivalence of direct and semi-direct speaking tests. Cambridge: Cambridge University (ISBN: 052166098X).
O'Malley, J. M. (1996). Authentic assessment for English language learners. Boston: Addison Wesley (ISBN: 0201591510).
O'Sullivan, B. (forthcoming). Studies in language testing 17: Issues in testing business English: The revision of the Cambridge Business English Certificates. Cambridge: Cambridge University (ISBN: 0521013305).
Purpura, J. E. (1999). Studies in language testing 8: Learner strategy use and performance on language tests: A structural equation modeling approach. Cambridge: Cambridge University (ISBN: 0521658756).
Purpura, J. E. (2004). Assessing grammar. Cambridge: Cambridge University (ISBN: 052100344X).
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University (ISBN: 0521627419).
Röver, C. (2005). Language testing and evaluation 2: Testing ESL pragmatics: Development and validation of a web-based assessment battery. Frankfurt: Peter Lang (ISBN: 082047343X).
Sigott, G. (2004). Language testing and evaluation 1: Towards identifying the C-test construct. Frankfurt: Peter Lang (ISBN: 0820465585).
Spolsky, B. (1995). Measured words. Oxford: Oxford University (ISBN: 0194372014).
Taylor, L. & Falvey, P. (Eds.) (forthcoming). Studies in language testing 19: IELTS collected papers: Research in speaking and writing assessment. Cambridge: Cambridge University.
Teachers of English to Speakers of Other Languages (TESOL). (2001). Scenarios for ESL Standards-Based Assessment. Alexandria, VA: TESOL (ISBN: 0939791900).
University of Cambridge Local Examination Syndicate (UCLES). (1999). Studies in language testing 6: Multilingual glossary of language testing terms. Cambridge: Cambridge University (ISBN: 0521658772).
University of Cambridge Local Examination Syndicate (UCLES). (1999). Studies in language testing 6: Multilingual glossary of language testing terms (Audio CD). Cambridge: Cambridge University (ISBN: 0521658241).
Wall, D. (2005). Studies in language testing 22: The impact of high-stakes testing on classroom teaching: A case study using insights from testing and innovation theory. Cambridge: Cambridge University (ISBN: 0521542499).
Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University (ISBN: 0521784468).
Weir, C. J. & Milanovic, M. (2003). Studies in language testing 15: Continuity and innovation: Revising the Cambridge Proficiency in English Examination 1913-2002. Cambridge: Cambridge University (ISBN: 0521813506).
Weir, C., Huizhong, Y., & Yan, J. (2000). Studies in language testing 12: An empirical investigation of the componentiality of L2 reading in English for academic purposes. Cambridge: Cambridge University (ISBN: 0521653819).
Yamashita, S. O. (1996). Six measures of JSL pragmatics. Honolulu, Hawaii: University of Hawaii. (ISBN: 0824819144).
[ p. 25 ]
As you are checking out some websites, subscribing to a couple of language testing journals, joining a language testing organization (and attending their conferences and workshops), and reading a half dozen books on language testing, you might also consider doing some actual nuts and bolts language testing. Such nuts and bolts involve getting your hands dirty by doing some actual language test development, which in turn involves writing test items, administering the items, item analyzing the results, revising the test on the basis of those results, validating the test, and doing research based on the test. Since doing language testing is the fun part, I recommend getting started with that as soon as possible. Once you take this hands on step, you'll truly be hooked. Enjoy!
Testing is considered a way to systematically measure a person's ability or knowledge, and it is formalized as a set of techniques or procedures (Brown, 2001). Testing also plays an important part in language learning and evaluation in classroom settings. In fact, a number of Second Language Acquisition (SLA) and Language Testing (LT) researchers have discussed the roles of testing and its development in their fields. This paper provides a historical overview of the development of those views over the last three decades. Some fundamental concepts and criteria for good language testing will be also examined. Next, after pointing out some different viewpoints between SLA and LT, I will further explore roles of language testing in SLA research by focusing on the knowledge and ability which underlie learners' performance. Lastly, on the basis of two key SLA studies, I will argue that, in order to improve the quality of language testing and to provide good opportunities for language learning, it is necessary to consider the test-taking process and the strategies required by the communicative approaches which are currently recommended. In this regard Brindley (2001, p.139) states:
[ p. 2 ]
Many language tests and assessments used nowadays often contain tasks which resemble the kinds of language-use situations that test takers would encounter in using the language for communicative purposes in everyday life.
Historical developments in language testing in SLA research
During the 1960s and 1970s, language testing techniques were heavily influenced by structural linguistics (Chew, 2005). The analysis of language favoured by behaviourist approaches (e.g. Skinner) led to discrete point testing, that is to say, tests were designed to assess learners' mastery of different areas of the linguistic system in isolation [e.g. grammatical knowledge, vocabulary, pronunciation etc.] (reported by Bachman & Cohen 1998, Brindley 2001). It was Chomsky (1965) who first rejected such approaches and proposed an underlying rule-based knowledge system. From the early 1970s, however, communicative theories were widely adopted among linguistics and they began to focus on "communicative proficiency rather than on mere mastery of structures" in language teaching (Richards 2001, p.153). This trend significantly influenced the methods of language teaching and roles of language testing, although it is highly possible to assume that some social changes induced new theories at first, and then the theories might be modified to support practice more closely. Hymes took Chomsky's work further, but also reacted against some aspects of it. For Hymes (1972), the social context of language was considered essential and appropriateness was viewed as important as grammatical correctness. Discrete-point teaching and testing models were gradually replaced by models which aimed to integrate the various elements of language learning. A theory of communicative competence was developed further by Canale and Swain (1980). They also raised two controversial issues related to second language teaching and testing which will be explored later:
whether communicative competence and linguistic competence are mutually inclusive or separate,
whether one can usually distinguish between communicative competence and performance (Spolsky 1985, p.183)
According to the new trends mentioned above by Richards (2001), since the 1970s language testers have been seeking more pragmatic and integrative questions for assessment, such as cloze tests and dictations. McNamara (2000, pp.14-15) points out the need by stating:
the necessity of assessing the practical language skills of foreign students led to a demand for language tests which involved an integrated performance on the part of the language user. The discrete point tradition of testing was seen as focusing too exclusively on knowledge of the formal linguistic system for its own sake rather than the way such knowledge is used to achieve communication.
For instance, Oller (1979) proposed the Unitary Competence Hypothesis, which reflects the view that "performance on a whole range of tests depends on the same underlying capacity in the learner - the ability to integrate grammatical, lexical, contextual, and pragmatic knowledge in test performance (McNamara 2000, p.15)". This theory, however, is no longer accepted as a possible model of how language is processed.
In these days, with the widespread adoption of communicative language teaching (CLT) principles, language tests tend to include more practical tasks predicting the real-world settings (see Brindley, 2001). Although we still can encounter many discrete point questions, recent tests seem to have more diversity and alternative evidence for assessment congruent with the communicative paradigm. For example oral examinations such as role-play of speech acts, structured interview and information gap exercises contain tasks which reflect real-life demands are now common in many language tests.
[ p. 3 ]
Fundamental concepts of good language testing
Although language testing has been influenced by social changes such as those described above, there are certain fundamental aspects which remain widely accepted. According to Bell (1981, pp.197-198), it is important to bear in mind what a test actually tests and how well it does so. Bell's view may be considered together with Harris' analysis (1969, pp.21-22) which suggests that there are three key criteria to evaluate any 'good' test, namely, reliability, validity and practicality.