June 2004 -- Volume 8, Number 1

Testing Second Language Speaking

Glenn Fulcher (2003)
Harlow: Pearson Longman
Pp. xxi + 288
ISBN 0-582-47270-9 (paper)
£19.99

The testing of second language speaking is a relatively new field, even within the young discipline of applied linguistics. Among the reasons for this, we can list the ephemeral nature of speech, the relative lack of interest in the spoken language shown by pre-1970s linguists, and the difficulty of devising objective assessment criteria. Glenn Fulcher's book Testing Second Language Speaking, a new addition to the Pearson Longman series Applied Linguistics and Language Study, documents the short history of testing spoken English and provides abundant information about the current methodology of testing speaking.

This book leads us gently into the subject with an outline of the history of speaking tests in English since 1913, which concludes that speaking tests have always been credited with importance, but that the lack of consensus regarding criteria, and the difficulty of fitting speaking into the framework of quantitative psychometric testing fashionable in the USA, meant that until the 1970s, speaking tests were generally not placed on the same level as pencil-and-paper tests.

One of the main problems underlying speaking tests is that "speaking" is a difficult construct to define. Speech can be broken down into pronunciation and intonation, accuracy and fluency, or it can be categorized in terms of strategies, or it can be regarded as a form of interaction and analyzed using the methods of pragmatics or discourse analysis. The problem is that in the course of a normal conversation, all of these aspects are important. If testers try to separate out the strands, they may well find that the ecology of speaking is different in different successful speakers. This means that the accurate speaker may communicate effectively, but slowly, whereas the fluent speaker may sacrifice accuracy for the sake of rapid communication (Skehan, 1998). Fulcher does not go into much detail about the recent research concerning trade-offs of this kind, but he does succeed in conveying the main points about the difficulty of defining speech, and the problems this poses for the tester. Ultimately, he concludes that "the purpose of testing second language speaking is similar to that of a driving test. The purpose of a speaking test is to collect evidence in a systematic way (through elicitation techniques or tasks) that will support an inference about the construct as we define it from the summary of the evidence (the 'score'). We will also be interested in the learner's ability to perform in a range of situations much wider than those that can be sampled during the test" (p. 47). To provide a valid speaking test, it is necessary to capture the relevant aspects of speaking on the one hand, and prevent interference in the score from irrelevant factors, on the other.

After defining what speaking is and what a speaking test should do, Fulcher proceeds to review some of the tasks currently used in second language speaking tests. As far as the task is concerned, the fundamental questions are: will the task elicit a performance that can be scored, and will it be possible to make inferences from the score to the construct we intend to measure? Examples from different examinations boards are used to show different approaches to eliciting representative performance, and to explain the problems with each approach. [-1-]

Although Fulcher maintains that the task type is important, he makes a strong case for difficulty not residing in the task itself, but in an interaction of tasks, conditions and test-takers. He therefore underlines the importance of the rating scale as the main means of operationalizing the construct that a particular test is supposed to measure, which means that this construct should be absolutely central to the rating scale. This leads into the question of how to devise specifications for particular speaking tests, which should bring together the various theoretical aspects of language testing in a concrete, usable form. This part of the book will be useful for anyone involved in developing new speaking tests, and is of some interest to oral examiners. More than anything else, this section of the book, with its analysis of various examples of real test specifications, underlines the difficulty involved in moving from a needs analysis, to a test specification, to a real test with tasks and a rating scale.

The reliability of any test of spoken language hinges on the role of oral examiners or raters. Unfortunately, there is abundant evidence that inter-rater reliability tends to be low, which is why large examination boards are now devoting considerable time and effort to examiner training and standardization. This is a costly procedure, but as Fulcher points out, it may only be the tip of the iceberg as far as the costs of testing speaking are concerned. In the chapter on "Raters, training and administration", the author takes the unusual step of providing a budget (in pounds sterling, presumably at 2003 rates) for what it would cost to design an oral test, from start (defining test purpose) to finish (research). As Fulcher points out, economic issues are really at the heart of any testing operation, and there may have to be a trade-off between what an institution can afford and the degree of validity that is obtained.

The last two chapters of the book are dedicated to evaluating and researching second language speaking tests. The chapter on evaluation presents a useful summary of the findings of empirical research into direct (face-to-face interview) and indirect (computer or tape-mediated) speaking tests, concluding that the two types of test rely on different construct definitions, and that they may be equally valid, within different parameters.

The section on quantitative analysis of speaking test results begins with a short introduction to the statistical methods used for the benefit of the non-specialist, although I feel that I would have benefited from a lengthier explanation of how each quantitative experiment was set up, how the statistics were obtained, and what they were supposed to show. I found the section on qualitative data far more revealing, particularly the brief samples of self-report data. Since examinee perceptions are fundamental to the functioning of the basic test constructs, I found it surprising that so little space was devoted to this aspect, or to the burgeoning literature concerning the ethnography of communication in classrooms and tests (Mercer, 1995; van Lier, 1996, 1998).

Apart from these minor points, this book provides a much-needed overview of the issues involved in second language speaking tests. Fulcher succeeds in integrating practice and theory, meeting the challenge of making a difficult area accessible to busy language professionals. Testing Second Language Speaking is an essential book for anyone involved in the design of speaking tests, and is useful reading for examiners, test administrators, MA students and anyone interested in gaining a thorough understanding of testing spoken language.

References

Mercer, N. (1995). The Guided Construction of Knowledge: Talk amongst teachers and learners Clevedon: Multilingual Matters.

Skehan, P. (1998). A Cognitive Approach to Language Learning Oxford: Oxford University Press.

van Lier, L. (1996). Interaction in the language curriculum: Awareness, Autonomy, and Authenticity London: Longman.

van Lier, L. (1998). Constraints and resources in classroom talk: Issues of equality and symmetry. In Byrnes, H. (ed.) Learning foreign and second languages: Perspectives in research and scholarship New York: The Modern Language Association. 157-182.

Ruth Breeze
Universidad de Navarra
<rbreeze@unav.es>

© Copyright rests with authors. Please cite TESL-EJ appropriately.

Editor's Note: Dashed numbers in square brackets indicate the end of each page for purposes of citation..

[-2-]