STATEMENT OF AUTHORSHIP
I certify that the minor thesis entitled ‘Oral Language Testing at Tay Nguyen University: Current Practices and Recommendations for Improvement’ and submitted in partial fulfilment of the requirements for the degree of Master of Arts in TESOL is the result of my work, except where otherwise acknowledged, and that this minor thesis or any part of the same has not been submitted for a higher degree to any other university or institution.
The research reported in this thesis was approved by Hanoi University of Foreign Studies.
Signed: Le Thi Phuong Nhi. DH Tay Nguyen
Dated:24 February 2008
table of contents
STATEMENT OF AUTHORSHIPi
table of contentsiii
ACKNOWLEDGEMENTSvii
glossaryviii
ABSTRACTx
list of figures and tablesxi
Table 1.1: The second-year students’ oral test results 2xi
Table 1.2: The third-year students’ oral test results 2xi
Figure 2.1: Continuum of Spoken Language Production 8xi
Figure 2.2: Conditions of Communicative Stress in a Task 8xi
Figure 2.3: Success of Meaning Negotiation 10xi
Figure 2.4: The Model of Test Development 15xi
Table 2.1: Level Scale of Language Proficiency Based on the Global Scale by Council of Europe 21xi
Table 2.3: Oral Test Types and Elicitation Techniques 26xi
Table 4.1: A checklist for Oral Test Development 44xi
Table 4.2: Summary of Oral Test Types Used in the Achievement Speaking Test for the Second-Year Students (School Year 2002-2003) 45xi
Table 4.3: Summary of the Students’ Oral Test Performance in the Achievement Speaking Test for the Second-Year Students 47xi
Table 4.4: Correct Answers for the Questions in the Questionnaire 56xi
Table 4.5: Teachers’ Assessment Priority Perception of Interactional and Transactional Short Turns 58xi
Table 4.6: Teachers’ Assessment Priority Perception of Transactional Long Turns 58xi
Table 4.7: Teachers’ Choice of Number of Tasks for a Speaking Test 59xi
Table 4.8: Teachers’ Choice of Elicitation Techniques for Levels of Proficiency 59xi
Table 4.9: Teachers’ Choice of Specific Test Tasks for Level of Proficiency 60xi
Table 4.10: Teachers’ Choice of Steps to Be Considered in Oral Test Design and Operationalization 60xi
Table 4.11: Teachers’ Confidence in Students’ Test Results 60xii
Table 4.12: Teachers’ Lack of Confidence in Students’ Test Results 61xii
Table 5.1: The Marking Scales for Task 1 of the Sample Term 1 Achievement Speaking Test 76xii
Table 5.2: The Marking Scales for Task 2 of Sample Term 1 Achievement Speaking Test 78xii
Table 5.3: The Marking Scales for Task 1 of Sample Term 2 Achievement Speaking Test 80xii
Table 5.4: The Marking Scales for Task 2 of Sample Term 2 Achievement Speaking Test 82xii
CHAPTER 1: INTRODUCTION1
1.1 The Problem1
1.1.1 Theoretical Perspective1
1.1.2 Practical Perspective2
1.2 Aims and Overview of the Thesis3
CHAPTER 2: LITERATURE REVIEW6
2.1 Typical Features of Spoken Language6
2.2 Communicative Approach to Testing Oral Language Ability11
2.3 Theoretical Framework for Oral Test Development14
2.3.1 Design Stage15
2.3.2 Operationalization Stage16
2.3.3 Administration Stage18
2.4 Major Considerations in Operationalization of Speaking Tests18
2.4.1 Level Scale19
2.4.2 Oral Test Types and Elicitation Techniques21
2.4.2.1 The Direct Interview Type21
2.4.2.2 The pre-arranged Information Gap Tests22
2.4.2.3 Tests Where the Learner Prepares in Advance23
2.4.2.4 Mechanical/Entirely Predictable Tests24
2.4.3 Marking Key27
2.5 Qualities of a Good Test29
2.5.1 Validity29
2.5.2 Reliability30
2.5.3 Practicality31
2.6 Summary32
CHAPTER 3: methodology33
3.1 Research Questions33
3.2 Data Collection Instruments33
3.2.1 The Checklist34
3.2.2 The Observation36
3.1.3 The Questionnaire36
3.3 Procedures37
3.4 Summary39
CHAPTER 4: RESULTS AND DISCUSSION40
4.1 Evaluation of TNU Current Development Process of Oral Language Tests 40
4.1.1 Review of TNU Current Development Process of Oral Language Tests40
4.1.2 The Observation Results 45
4.1.3 Analysis of the Results47
4.2 Evaluation of TNU staff’s Perceptions of Oral Testing55
4.2.1 Results56
4.2.2 Analysis of the Results61
4.3 Summary64
CHAPTER 5: RECOMMENDATIONS AND CONCLUSION66
5.1 Recommendations for TNU Oral Testing Practices66
5.1.1 Recommendations for TNU Development Process of Achievement Speaking Tests67
5.1.1.1 Rating/Level Scale67
5.1.1.2 Blueprint for Development of Achievement Speaking Tests at TNU70
5.1.1.3 Standardisation Meeting71
5.1.1.4 Supportive Test Taking Environment 72
5.1.1.5 Use of Test Results for Teaching Evaluation72
5.1.2 Practical Applications to the Operationalization Process of Speaking Tests for First-Year Students73
5.1.2.1 Suggested Tasks in the TLU Domain for Inclusion in Speaking Tests for First-Year Students 73
5.1.2.2 Two Sample Achievement Speaking Tests for First-Year Students 75
5.2 Conclusion82
REFERENCES85
APPENDIces89
Appendix 1: Three Achievement Speaking Tests Used at TNU89
Appendix 2: 95
Achievement Speaking Test for the Second-Year Students 95
(Term 2 – School Year 2002-2003)95
Appendix 3: The Tapescript of the Test Recorded96
Appendix 4: PHIẾU KHẢO SÁT104
109
Year 1 Year 2 Year 3109
b. Oral report 109
h. Role-play 109
i. Reading aloud 109
Year 1 Year 2 Year 3 109
ACKNOWLEDGEMENTS
I would like to show my greatest gratitude to my thesis supervisor, Ms. Nguyễn Thị Thanh Hà, who assisted and encouraged me much by providing insightful discussions, valuable comments and criticisms in the preparation and completion process of this thesis.
I would wish to send my special thanks to the organisers of this master course, Ms. Phạm Kim Ninh, Head of the Department of Post Graduate Studies of Hanoi University of Foreign Studies, Ms. Nguyễn Thái Hà, Ms. Phạm Thu Hương, the staff of this department, and the leaders of Tay Nguyen University.
I am also grateful for the permission to attend this master course given by the leaders of Tay Nguyen University, and especially to the staff members of the English Section for their assistance and participation in the research project.
glossary
This glossary is intended to give working definitions of terms used frequently in this thesis in order to help readers understand the author’s intended meaning.
Communicative stress This term means the difficulty degree of a task that a speaker has to carry out. This difficulty refers to all the conditions under which the speaker is put to perform the task.
Elicitation technique An elicitation technique involves the procedure of performing a task that inferences of a speaker’s language ability are based on.
Interactional function This term refers to one of the two functions of spoken language. A speaker producing an interactional instance of spoken language wants to make the interaction atmosphere pleasant.
Level scale or rating scaleA level or rating scale used in this thesis is a document displaying classified levels of learners’ language knowledge and what learners can do at each level.
Long turnA long turn is a string of utterances that a speaker produces
Oral test type A type of oral test refers to the way a test task requires test takers to do.
Short turnA short turn is speech of one or two utterances that a speaker produces.
Test administration Test administration involves the delivery of a set of test tasks to a group of test takers under specified conditions.
Test designTest design in this thesis refers to the production of a principled statement as a basis for writing actual tests and administering them.
Test developmentTest development is the entire process of creating and using a test. It involves test design, test operationalization and test administration.
Test operationalizationTest operationalization is the production process of actual tests. It involves developing test task specifications and test structure.
Test structure Test structure refers to the number of test tasks included in a test
Test task specificationsTest task specifications tell in detail what a test is designed to measure and how it will be tested
Transactional functionThis term refers to one of the two functions of spoken language. A speaker producing a transactional instance of spoken language means to convey his intentions and messages.
ABSTRACT
Assessment of oral language proficiency at Tay Nguyen University (TNU), where the author of this thesis works, has been claimed to be extremely problematic. This thesis takes a critical look at the reality of oral English language testing at this institution to point out its strengths and weaknesses and the cause(s) of the existing drawbacks or problems.
In order to achieve this, a study was carried out to evaluate TNU current practices and TNU staff’s perceptions of oral language testing. The methods employed in the study include: (1) an detailed analysis of the current oral test development process based on Bachman & Palmer’s theoretical framework for test development, and (2) a survey on how well the staff know about oral skill assessment.
The results of the study show that (1) the current oral testing practices at this institution are far from being consistent with the language testing theory, and (2) the staff have gained limited and insufficient knowledge of oral language testing. These findings serve as the basis for seven practical recommendations made for the improvement and standardisation of TNU current oral testing practices.
The seven recommendations are as follows. Recommendations 1,2,3,4 & 5 are made as an effort to make relevant applications which are based on Bachman & Palmer’s theoretical framework for test development. These recommendations can be considered as guidelines for developing speaking tests in general. Recommendations 6 & 7 are particularly intended for the sample operationalization of speaking tests for first-year students.
list of figures and tables
Table 1.1: The second-year students’ oral test results2
Table 1.2: The third-year students’ oral test results2
Figure 2.1: Continuum of Spoken Language Production8
Figure 2.2: Conditions of Communicative Stress in a Task8
Figure 2.3: Success of Meaning Negotiation10
Figure 2.4: The Model of Test Development15
Table 2.1: Level Scale of Language Proficiency Based on the Global Scale by Council of Europe21
Table 2.3: Oral Test Types and Elicitation Techniques26
Table 4.1: A checklist for Oral Test Development44
Table 4.2: Summary of Oral Test Types Used in the Achievement Speaking Test for the Second-Year Students (School Year 2002-2003)45
Table 4.3: Summary of the Students’ Oral Test Performance in the Achievement Speaking Test for the Second-Year Students47
Table 4.4: Correct Answers for the Questions in the Questionnaire56
Table 4.5: Teachers’ Assessment Priority Perception of Interactional and Transactional Short Turns58
Table 4.6: Teachers’ Assessment Priority Perception of Transactional Long Turns58
Table 4.7: Teachers’ Choice of Number of Tasks for a Speaking Test59
Table 4.8: Teachers’ Choice of Elicitation Techniques for Levels of Proficiency59
Table 4.9: Teachers’ Choice of Specific Test Tasks for Level of Proficiency60
Table 4.10: Teachers’ Choice of Steps to Be Considered in Oral Test Design and Operationalization60
Table 4.11: Teachers’ Confidence in Students’ Test Results60
Table 4.12: Teachers’ Lack of Confidence in Students’ Test Results61
Table 5.1: The Marking Scales for Task 1 of the Sample Term 1 Achievement Speaking Test76
Table 5.2: The Marking Scales for Task 2 of Sample Term 1 Achievement Speaking Test78
Table 5.3: The Marking Scales for Task 1 of Sample Term 2 Achievement Speaking Test80
Table 5.4: The Marking Scales for Task 2 of Sample Term 2 Achievement Speaking Test82
CHAPTER 1: INTRODUCTION
This thesis reports the results of a study carried out to investigate the current practices of oral testing at Tay Nguyen University (TNU) in order to point out the existing problems and to make some practical suggestions for improvement. This introductory chapter describes in detail the problem the thesis attempts to solve, states the objectives of the study, and provides an overview of the thesis.
1.1 The Problem
1.1.1 Theoretical Perspective
Theoretically, as has become clear through empirical studies in language testing, there has been ‘a shift from using assessment as a way to keep students in their place to using assessment as a way to help students find their place in school and in the world community of language users’ (Cohen, 1996, p. 3). In this popular tendency of treating language tests, language tests have been considered extremely helpful for both students and teachers, and even for administrators. Madsen (1983, p. 4-5) points out the importance of language testing by demonstrating that properly made tests can
‘1. help create positive attitudes towards instruction by giving students a sense of accomplishment and a feeling that the teacher’s evaluation of them matches what he has taught them.
2. help students learn the language by requiring them to study hard, emphasizing course objectives, and showing them where they need to improve.
3. help teachers and administrators by confirming progress that has been made and showing how they can best redirect their future efforts.’
Therefore, being competent in language testing, particularly in oral language testing under review in this thesis, is claimed to be crucial for language teachers to properly develop language tests. This thesis is an attempt to provide a clear discussion of how to become competent in oral language testing. An answer to this question will explicitly help to evaluate TNU current oral testing practices.
1.1.2 Practical Perspective
Apart from the above theoretical concern, this thesis also grows out of a practical consideration regarding the researcher’s work at TNU as an English teacher and assessor of students’ oral test performance. The problem identified in this thesis has taken root from the existing oral testing practices at TNU. The following are two tables of oral test results of the second-year students (School Year 2001-2002) and of the third-year students (School Year 2002-2003).
| Students with |
Term 1 |
Term 2 |
Average |
| Mark 4 |
0% |
6,5% |
3,25% |
| Marks 5 and 6 |
38% |
41,5% |
39,75% |
| Mark 7 |
33,5% |
27% |
20,25% |
| Marks 8 and 9 |
28,5% |
25% |
26,75% |
Table 1.1: The second-year students’ oral test results
| Students with |
Term 1 |
Term 2 |
Average |
| Mark 4 |
0% |
0% |
0% |
| Marks 5 and 6 |
26% |
39% |
32,5% |
| Mark 7 |
40% |
33% |
36,5% |
| Marks 8 and 9 |
34% |
28% |
31% |
Table 1.2: The third-year students’ oral test results
The two tables reveal that nearly half of the second-year students (47%) and more than half of the third-year students (67,5%) get high marks (7,8 and 9). However, in a talk with the researcher about their speaking ability, preferably their results of the former speaking tests, the majority of those students who got high marks in the tests seemed very reluctant to agree that their test results really revealed their actual ability to use English for communication. For example, they still found it hard to use English to either satisfactorily communicate their ideas or make themselves fully understood in a real instance of communication. So why did they get such high scores?
The same question was put forwards for discussion with the teachers and the assessors of speaking skill. Most of them believed that those students deserved high scores because they actually performed their tasks fluently and were able to answer the questions of the examiners. Nevertheless, when the author asked these teachers what particular speaking abilities they expected to see in the students’ test performance, what exact criteria their assessment was based on, and what the detailed procedure for test design was, they did not all provided any specific and clear answers. They could not point out speaking abilities expected, they have assessed students’ test performance intuitively, and they have constructed oral tests in their own way. Thus, as can be concluded that there seems to have never been any official detailed guidance for the construction and administration of oral tests at this institution.
Most of the staff members whom I have discussed this matter with have shared the same worry and showed interest in how to gain a scientific approach to assessing our students’ oral ability properly and fairly. In other words, we all would like to find out how to appropriately measure the students’ speaking ability and how to write useful speaking tests. This aims at helping to ensure fairness for the students, and improve and maintain the training quality of the institution.
All the concerns described above indicate an urgent need to evaluate the reality of oral language testing at TNU in the light of language testing theory. A thorough study is carried out as an attempt to help the staff give the oral testing an adequate position in their training program.
1.2 Aims and Overview of the Thesis
This thesis is carried out with the two main aims: firstly, to investigate the existing oral testing practices at TNU; and secondly, to make suggestions for improvement. Based on Bachman and Palmer’s theoretical frammework for developing language tests, the author recommends a procedure for speaking test development as an attempt to provide a profound understanding of how to properly develop an oral test. Therefore, these recommendations are hopefully used as guidelines for oral language test development not only at TNU but also at other institutions throughout Vietnam. The data used for the two purposes above are collected from two different sources: (1) a detailed analysis of current oral testing practices at TNU, in particular, the development procedure of an achievement speaking test; and (2) a questionnaire survey given out to 12 TNU teachers of English to investigate their perceptions of oral testing and thus to find out the cause(s) of the current practices.
The thesis consists of five chapters as follows:
Chapter 1 identifies the problem and provides an overview of the thesis.
Chapter 2 reviews the literature related to major issues in oral language testing such as essential features of spoken language, a theoretical framework for test development in general and development of oral tests in particular, and major qualities of a test.
Chapter 3 describes the methodology employed in the study. In order to evaluate current practices, the study involves describing the existing practices of spoken language testing at TNU, and investigating the staff’s perceptions of oral testing by delivering questionnaires to 12 staff members.
Chapter 4 presents the results of the study, and analyses the results to point out findings.
Chapter 5 makes some practical recommendations for standardisation of TNU oral testing practices, and provides a summary of the main details of the whole thesis with a conclusion ending the thesis.
CHAPTER 2: LITERATURE REVIEW
Chapter 1 presents the background information of the study. This chapter looks at main issues of oral testing. The discussion of the issues is meant to give a theoretical foundation on which to develop a framework for developing oral tests. The chapter discusses the following issues: (1) typical features of spoken language, (2) communicative approach to testing oral language ability, (3) theoretical framework for test development, (4) major considerations in construction of oral test tasks and tests, and (5) qualities of a good test.
2.1 Typical Features of Spoken Language
Spoken language had been ignored in language teaching long before it was noticed to be as essential as written language as well as other aspects of this science. From this time learners of a foreign language have been encouraged to learn how to produce spoken language forms spontaneously, not simply to utter written language sentences.
The features of spoken language reviewed here will help to specify typical and important areas of language knowledge to be involved in the process of testing speaking skill.
The most special feature of spoken language is its functions. Brown and Yule (1983) demonstrate that spoken language encompasses two functions in terms of a speaker’s intention. These two functions are defined as Interactional function and Transactional function. The former refers to the kind of spoken language speakers use to make their interaction atmosphere pleasant whereas the latter is concerned in interactions where speakers want to mainly convey their intentions and messages. Therefore, Brown and Yule (1983, p.13) assert that interactional language is listener-oriented while transactional language is message-oriented.
In interactional situations the participating speakers do not challenge each other to communicate information, and tend to end up feeling friendly and comfortable with each other. In transactional situations information transmission requires language exchanges between interlocutors to be understandable and appropriate. Obviously, ‘all foreign learners of English, who wish to learn the spoken form of the language, need to be able to express their transactional intentions’. They must know how to make clear the ideas to be communicated, even in their own mother tongue environment, yet it is easier to make themselves understood in their own language than in a new language.
Another crucial feature of spoken language is length of its production, that is the language is orally produced at length or not. Speech consisting of only one or two utterances is defined as a short turn, and that of a string of utterances is defined as a long turn by Underhill (1987, p.16). Taking short turns is of course less demanding than taking long turns. When in position of taking a transactional long turn, a speaker is immediately ‘responsible for creating a structured sequence of utterances which must help the listener(s) to create a coherent mental representation of what he is trying to say’.
As regards these two features a product of spoken language can be considered in such a continuum as interactional short turns – transactional short turns – transactional long turns. The difficulty of spoken language production ranges from the one extreme to the other extreme of the continuum, and the level of difficulty is shown in the figure 2.1 below. Clearly teaching as well as testing speaking skill should gradually follow this continuum according to learners’ level of language proficiency.
Interactional Transactional
Short turns
Long turns
Figure 2.1: Continuum of Spoken Language Production
The above figure indicates that content to be taught or assessed should be graded according to the difficulty of tasks intended for the course purposes. The degree of this difficulty is determined by communicative stress, which involves three conditions under which a speaker feel more or less comfortable in producing what he has to (Brown and Yule, 1983, p.34). The less stressful a task is, the easier it is for speakers to carry out. These three conditions are features of the context, state of knowledge of the listener and type of task shown in the figure 2.2 below.
Communicative StressState of knowledge of the listenerType of taskFeatures of the context-The listener-The situation-The language-The information-Status of knowledge-Structure of the task
Figure 2.2: Conditions of Communicative Stress in a Task
The listener refers to the relationship between the speaker and the listener, or the number of the listeners he is talking to. The situation is concerned with the speaking environment (is it familiar or unfamiliar, and private or in public?). The language relates to the listener(s)’ language proficiency in comparison with the speaker’s, and the information is what the listener wants or needs. Status of knowledge mentions the degree of familiarity of the task’s topic, and structure of the task refers to the purpose of the task or the difficulty of the task itself. This difficulty ranges from the static relationships to the abstract relationships between what is being talked about and what is going to be said. Obviously, tasks involving ‘abstract relationships are more difficult than those involving the description of static and dynamic relationships’ (Nunan, 1991, p. 48). O’Malley & Pierce (1996, p. 76) state these relationships correspond to an increase in difficulty levels. The tasks intended for the purpose(s) of teaching or testing should thus be graded according to these relationships as follows:
- Static relationships
Describing an object or photograph
Instructing someone to draw a diagram
Instructing someone how to assemble a piece of equipment
Describing/instructing how a number of objects are to be arranged
Giving route directions
- Dynamic relationships
Story-telling
Giving an eye-witness account
- Abstract relationships
Opinion-expressing
Justifying a course of actions
(Brown & Yule, 1983, p.109)
The difficulty of tasks additionally depends upon the number of relationships, elements, factors or characters within each task. For instance, ‘a short narrative involving a single character and only two or three events may be easier than a lengthy description covering many details and relationships’.
Linguistic KnowledgeSociocultural KnowledgeCooperative PrincipleProduction/Interpretation of Spoken LanguageFurthermore, how to ensure their production of spoken language in a new language to be appropriately interpreted is extremely demanding on the part of speakers/learners. In order to achieve this confidence, learners must first process their acquired knowledge of language and then produce utterances linguistically acceptable and socioculturally appropriate, and the utterances must conform to the cooperative principle (Celce-Murcia and Olshtain, 2000, p.168-171). This principle refers to the general rules of how to maintain the exchange flow between interlocutors, which means that ‘the speaker wants to be understood and interpreted correctly and the hearer wants to be an effective decoder of the messages he receives’. A speaker’s ideas successfully communicated are illustrated in figure 2.3 below.
Figure 2.3: Success of Meaning Negotiation
Linguistic knowledge, or Organizational knowledge (Bachman and Palmer, 1996; and Bachman, 1990), includes grammatical knowledge (ie. knowledge of vocabulary, morphology, syntax, and phonology/graphology), and textual knowledge (ie. rules of cohesion and coherence, and knowledge of rhetorical organisations). Sociocultural knowledge, or Pragmatic knowledge (Bachman and Palmer, 1996; and Bachman, 1990), is associated with ‘(1) characteristics of the individuals who take part in the communicative exchange, (2) features of the situation in which this exchange takes place, (3) the goal of the exchange, and (4) features of the communicative medium through which the exchange is carried out.
The assessment of learners’/students’ production of spoken language or their oral test performance is entirely based on the two major features of spoken language - interactional and transactional functions, and production length. Therefore, the criteria for assessment must be formed and founded on the basis of these two features, and these criteria vary according to learners’ language proficiency level or the difficulty of test tasks. In particular, the criteria for assessing learners’ interactional and transactional short turns are to focus more on learners’ or test takers’ communicative reaction and successfully negotiated ideas rather than on content, size, cohesion or coherence like in taking transactional long turns.
To sum up, the two functions and length of spoken language production are deeply associated with what to be tested in a test of oral ability, and how to ensure success of spoken language production is primarily related to how to test or assess learners’ oral ability. The next section will discusses the suitable approach to making right inferences from learners’ oral test performance.
2.2 Communicative Approach to Testing Oral Language Ability
Testing the oral ability in a language is one of the most important aspects of language testing. This ability is an extremely difficult skill to assess as Heaton (1988) and Brown & Yule (1983) suppose. Partly because of the difficulty of treating speaking tests in the same way as other more conventional tests, testing of speaking skill has generally received little attention. In a genuine speaking test, real people meet face to face, and talk to each other. Hence, it is the people and what passes between them that are important whereas the test instrument is secondary. To put it more closely, oral tests should be designed around the people involved so that they can be encouraged to talk to each other as naturally as possible.
For several decades, a new theory of language and language use has exerted a considerable influence on language teaching and potentially on language testing. For example, Hymes’s theory of communicative competence is concerned with not only language forms but also the ability to use language in socio-cultural context. Communicative competence in oral language ‘requires control of a wide range of phonological and syntactic features, vocabulary, and oral genres and the knowledge of how to use them appropriately’ (Butler et al., 2000, p.2) Although the relevance of this theory to language testing was recognized more or less immediately, it took quite long for its actual impact on practice to be felt in the development of communicative language tests. McNamara (2000, p. 16-17) characterises communicative language tests to have two features:
- they are performance tests, requiring assessment to be carried out when the learner or candidate is engaged in an extended act of communication;
- they pay attention to the social roles candidates are likely to assume in the real world settings, and offer a means of specifying the demands of such roles in detail.
The communicative approach to spoken language testing involves assessment of how language is used in real communication. Accordingly, Heaton (1988) states that most communicative language tests aim to ‘incorporate tasks which approximate as closely as possible to those facing the students in real life’. Success in actual language performance is judged in terms of the effectiveness of the communication which takes place rather than formal linguistic accuracy. Consequently, the assessment of learners’ production of spoken language or test performance should relatively concentrate more on interaction efficacy than on accuracy of language forms.
In addition, the four following characteristics of communicative language tests mentioned by Brown and Gonzo (1995, p.421-422) include a broad basis for both the design and use of language tests.
First, such tests create an ‘information gap,’ requiring test takers to process complementary information through the use of multiple sources of input. .... The second characteristic is that of task dependency,with tasks in one section of the test building upon the content of earlier sections .... Third, communicative tests can be characterized by their integration of test tasks and content within a given domain of discourse. Finally, communicative tests attempt to measure a much broader range of language abilities – including knowledge of cohesion, functions, and sociolinguistic appropriateness – than did earlier tests, which tended to focus on the formal aspects of language – grammar, vocabulary, and pronunciation.
To put it narrowly for oral testing, all speaking tests that encompass the same purpose of measuring test takers’ speaking ability in real interactions are expected to be used to assess authentic language use in context and the ability to communicate meaning, that is to include all the characteristics mentioned above. As previously discussed, the ability to communicate meaning is assured by success of meaning negotiation (figure 2.3 above) in actual acts of interaction.
Speaking tests aim at eliciting test takers’ ability of communicating ideas, and how to do this depends upon the content of test tasks or questions that fit students’ level of language proficiency. Test takers’ different levels of language proficiency can be reflected in the difficulty degree of test tasks. As reviewed in the previous section 2.1, this degree of task difficulty called communicative stress should be taken in account in the teaching and testing of speaking skill, especially on the part of teachers or test developers and assessors. A thorough understanding of the issue helps testers to make informed judgements of ‘what type of speaking activity the student would find reasonably ‘unstressful’ at a particular point in his course’ (Brown and Yule, 1983, p.107). Obviously, tasks of oral testing are to be graded mainly in accordance with the degree of communicative stress.
To sum up, the adequate approach, in my viewpoint, to assessing learners’ production of spoken language is to measure the extent to which they are able to successfully convey and achieve the intended purposes of a particular test task. In other words, learners’ performance on an oral test task should be examined in terms of communicative effectiveness or success of meaning negotiation. However, this assessment way, if a real success, is greatly related to the communicative stress under which test tasks are designed. Therefore, the next two sections will closely review more factors that must be taken into account during the construction process of speaking tests.
2.3 Theoretical Framework for Oral Test Development
The two previous sections have discussed the major features of spoken language taken into consideration in assessing production of spoken language, and have considered the communicative approach as the most adequate one to assessing spoken language production. This section describes in detail the theoretical framework for developing language tests which is intended for the following interpretation into the development of speaking tests.
Whether a test is useful or not much depends on test development process. Bachman and Palmer (1996) divide test development into three stages such as design stage, operationalization stage and administration stage.
This process of test development is illustrated in the figure 2.4
Test Development
Design stageAdministration stageOperationalization stage
-Purpose(s) of the test-Tasks in the TLU domain-Characteristics of the test takers-Construct to be measured-Plan for evaluation of test usefulness-Resources- Test task specifications- Blueprint-Giving the test-Collecting test results -Analyzing the results
Figure 2.4: The Model of Test Development
The design stage involves describing and identifying all the factors related to the test. These factors are the purpose(s) of the test as a whole, target language use tasks, test takers’ characteristics, language ability to be measured, usefulness of the test and resources.
The operationalization stage consists of two sub-stages. The former is the development of test task specifications referring to the purpose of individual test tasks, the construct to be measured, the setting, time allotment, instructions for responding to the task, characteristics of test input, and scoring method. The latter is the development of a blueprint – a description of ‘how test tasks will be organized to form actual tests’. The blueprint is therefore the structure of a test including the number of test tasks/parts and the relative importance of tasks/parts intended for the purpose(s) of the whole test, and the specifications of each test task.
The administration stage involves giving the test to a group of specific test takers, collecting test results, and analyzing the results.
2.3.1 Design Stage
The Design stage involves six activities all aiming at producing a design statement as a principled basis for the other two stages (Bachman & Palmer, 1996, p. 88). The six activities are as follows:
(1) Description of the test purpose(s). First specific inferences about language ability and capacity for language use made from the test takers’ performance are explicitly stated, and then specific decisions based on these inferences are provided.
(2) Identification and description of test tasks in the target language use (TLU) domain. A set of the TLU task types is characterized as the basis for developing actual test tasks.
(3) Description of the characteristics of the test takers. The characteristics to be believed to be particularly relevant to test development involve personal characteristics, topical knowledge, general level and profile of language ability, and predictions about test takers’ potential affective responses to the test.
(4) Definition of the construct/ability to be measured. The components of language ability to be assessed through the test task(s) are critically determined.
(5) Development of a plan for evaluating test usefulness. This plan consists of three parts as follows:
an initial consideration of the appropriate balance among the six qualities of usefulness and the setting of minimum acceptable levels for each,
the logical evaluation of usefulness, and
procedures for collecting qualitative and quantitative evidence during the administration stage.
(Bachman & Palmer, 1996, p.133-134)
(6) Identification of resources and development of a plan for their allocation and management. The resources refer to the people, material and time involved in test development. The balance between the resources available and required for test development should be taken into account in order to provide a good plan for how to allocate and manage them.
2.3.2 Operationalization Stage
The Operationalization stage need to be closely examined with the purpose of helping the concerned staff of TNU to equip themselves with a thorough understanding of the stage and then to gradually improve their practices of oral testing.
As mentioned above, this stage focuses on the structure of a test – the blueprint involving the number of test tasks/parts and specifications of each test task.
Test task specifications are described as follows:
1. The purpose of the test task.
2. The definition of the construct to be measured.
3. The characteristics of the setting of the test task
4. Time allotment.
5. Instructions for responding to the task.
6. Characteristics of input, response, and relationship between input and response.
7. Scoring method.
(Bachman & Palmer, 1996, p.172-173)
These test task specifications can be interpreted in the context of oral testing as:
- The purpose of the test task.
- Specified components of oral ability to be tested.
- The place where the task or language act occurs.
- Expected duration of task performance.
- Specified and understandable instructions.
- Areas of linguistic, pragmatic and topical knowledge adequate.
- Marking key
In order to well operationalize or produce test task specifications, test writers should make informed judgements of major considerations in oral test operationalization reviewed in detail in Section 2.4
2.3.3 Administration Stage
The administration of a test mainly involves three activities such as giving the test to a particular group of test takers, gathering test results and analyzing the results (Bachman & Palmer, 1996, p. 91). In particular, procedures for administering a test include preparing the testing environment, communicating the instructions, maintaining a supportive test taking environment, and collecting the test papers. These all aim at ‘guiding test takers through the process of taking the test in accordance with the procedures specified in the test blueprint.’
2.4 Major Considerations in Operationalization of Speaking Tests
The communicative approach is considered as one of the most adequate way to measure learners’ oral language ability. In order to successfully apply this assessment approach, oral test tasks intended for specific tests must be designed in terms of difficulty degree or communicative stress fitting test takers’ language proficiency level. Operationalizing a speaking test that fits the test takers’ level of language proficiency, test writers or teachers must (1) know the exact level of the test takers, then (2) choose suitable oral test types or elicitation techniques for test tasks, and finally (3) design the method of marking each task. The following three sub-sections discuss these factors, which should be taken into sound consideration during the operationalization of speaking tests.
2.4.1 Level Scale
The explicit classification of test takers’ language knowledge levels helps to grade test tasks according to the communicative stress. It is displayed on a formal document including established ‘criterion levels of oral language proficiency based on the goals and objectives of classroom instruction’ (O’Malley & Pierce, 1996, p.65). This document, called a level scale or rating scale by Underhill (1987), is a series of short descriptions of different levels of language ability in terms of test takers’ or students’ language knowledge. It describes in brief what a typical learner at each level can do so that teachers and assessors can analytically select or grade test tasks that best fit each level, and can easily decide on the score to give each student in a test.
The following is an example of a level scale with four major levels (Table 2.1 on page 17) based on the level scale introduced in ‘Hệ thống Định chuẩn Trình độ Ngoại ngữ’ của Hội đồng Châu Âu by Vũ Thị Phương Anh and Nguyễn Thị Kim Thư (2003).
| Elementary |
Pre-intermediate |
Intermediate |
Upper-intermediate |
| -introduce oneself and others.-ask and answer questions about personal details such as where he/she lives, people he/she knows and things he/she has.-interact in a simple way provided that the other person talks slowly and clearly and is prepared to help.-use very simple expressions related to areas of the most immediate relevance (e.g. very basic family information, shopping, local geography, employment). |
-communicate in simple and routine tasks requiring a simple and direct exchange of information on familiar and routine matters.-dsecribe in simple terms aspects of his/her background, immediate environment and matters in areas of immediate need.-simply talk about familiar matters regularly encountered in work, school, leisure, etc. |
-use the language in most situations likely to arise when travelling in an area where the language is spoken.-make a simple connected presentation on topics which are familiar or of personal interest.-describe experiences and events, dreams, hopes and ambitions.-briefly give reasons and explanations for opinions and plans. |
-interact with a degree of fluency and spontaneity that makes regular interaction with native speakers without strain for either party.-make a clear and detailed presentation on a wide range of subjects.-give opinions on topical issues and explain the advantages and disadvantages of various options. |
Table 2.1: Level Scale of Language Proficiency Based on the Global Scale by Council of Europe
2.4.2 Oral Test Types and Elicitation Techniques
When test takers’ proficiency level is explicitly identified on the level scale, adequate oral test types and proper elicitation techniques must be critically selected to fit the test takers’ level and the testing situation. This sub-section reviews types of oral test in combination with elicitation techniques of this kind of testing, for these two aspects have an interrelated relationship. An elicitation technique involves the procedures of performance for each test task, and a test task itself represents a test type.
Underhill (1987) classifies oral tests/test tasks into four main types: (1) the direct interview type, (2) the pre-arranged information gap tests, (3) tests where the learner prepares in advance, and (4) mechanical/entirely predictable tests. Each type requires some specific techniques to elicit test takers’ language performance named elicitation techniques. The four following sub-sections respectively summarize these four oral test types in combination with the involved elicitation techniques
2.4.2.1 The Direct Interview Type
The direct interview is the most common and authentic type of oral test; there is no script and no preparation on the test taker’s part. The assessor or interviewer, of course, has quite a careful preparation, but not so rigid as to control exactly what the test taker says. This may result in difficulty in assessing the test performance consistently and reliably. (Underhill, 1987, p. 31)
The assessors should be flexible in choosing suitable and feasible techniques to well elicit the task of this type in a specific testing situation. The most common elicitation techniques used in this case are discussion/conversation, interview, form-filling and question and answer.
Discussion/conversation is associated with interaction between two or more people in which the assessor should create the right atmosphere in a very short time so that the test taker can respond to it. The topics discussed and the directions taken by the conversation are the result of this interaction. (Underhill, 1987, p. 45)
Interview, to some extent, is quite similar to discussion/conversation, but an interview is structured. That is to say, the assessor or interviewer maintains firm control and keeps the initiative; whatever the test taker says is in more or less response to the interviewer’s questions or statements. (Underhill, 1987, p. 54-56)
Form-filling is a technique in which the test taker and interviewer work together to fill in a form or questionaire. The questions is usually related to the test taker’s personal details, professional situation or language needs. Question and answer refers to a set of disconnected questions raised by the tester. The questions are graded according to difficulty to elicit the test taker’s opinions on certain topics. This technique may involve using different question types, giving cues for question formation, and naming. (Underhill, 1987, p. 58-59)
2.4.2.2 The pre-arranged Information Gap Tests
In such tests, an information gap between two test takers, or between a test taker and the assessor, is deliberately created by the test designer. The test taker’s success and speed in bridging that gap are taken as an indication of his oral proficiency. (Underhill, 1987, p. 32)
The elicitation techniques proposed for this type of oral test are learner-learner description and re-creation, picture story and role-play.
Learner-learner description and re-creation technique requires one test taker to describe a design or construction of model building materials to another test taker who has to reconstruct the model from the description alone, without seeing the original. The technique consists of reporting description to partner, map-reading, and comparing models. (Underhill, 1987, p. 56-58)
Picture story is widely used with more advantages than disadvantages. Before performing the test task, the test taker is given a picture or a sequence of pictures to look at. Then the test taker describes the picture(s) or story freely before being asked questions related to the story. This technique includes using several similar pictures, ordering pictures to create a picture story, using live action, and vocabulary naming from pictures. (Underhill, 1987, p. 66-69)
Role-play technique involves two people, each of whom takes on a particular role in a given particular situation. A few minutes just before the test the test taker(s) is given a set of written instructions to get prepared, and then he carries out his role in the given situation. This technique can be used between an assessor and a student, and between students. (Underhill, 1987, p. 51-52)
2.4.2.3 Tests Where the Learner Prepares in Advance
Tests of this type give the test taker a sufficient amount time to prepare the task. The preparation time will range from a few minutes for a blank dialogue to several hours or days for a presentation. (Underhill, 1987, p. 33)
The underlying techniques may be oral report, reading blank dialogue, and re-telling a story.
Oral report technique requires the learner to give an oral presentation on a given topic lasting from five to ten minutes. He or she can refer to the notes, but reading aloud is strongly discouraged. The use of such aids as an overhead projector, a board or flipchart diagrams is encouraged if appropriate. At the end of the presentation, the test taker has to answer all the questions raised by the tester. This technique can be applied by making a mini-presentation with limited preparation time, and by identifying a topic of personal interest at a previous stage. (Underhill, 1987, p. 47-49)
Reading blank dialogue is used in the test context in which the learner is provided a dialogue with only one part written in and prepares the missing lines in a few minutes. The interviewer reads through the given lines and the test taker fills in the blanks aloud. (Underhill, 1987, p. 64-66)
Re-telling a story technique requires the test taker to re-tell a story in his own words after reading it. The test taker is not allowed to refer back to the written text once he has begun to re-tell it. This can be carried out by using notes, using a set text, and using an unseen text. (Underhill, 1987, p. 73-75)
2.4.2.4 Mechanical/Entirely Predictable Tests
Mechanical-type tests determine in advance what the test taker is expected to say, for there is always a single correct answer. This complete predictability makes such tests unauthentic and non-communicative. Hence, they cannot be used to measure the test taker’s oral fluency but to measure grammatical knowledge or the mechanical aspects of speech such as pronunciation, stress and intonation patterns. (Underhill, 1987, p. 33)
Tests of this type encompass such elicitation techniques as reading aloud, sentence transformation, sentence repetition, translating/interpreting, sentence completion, and sentence correction.
Reading aloud technique requires the test taker to read aloud to the tester, either a passage of text, or part of a dialogue in which the tester or another testee reads the other part. This technique may consist of reading scripted dialogue with someone else reading the other part, reading text with phonetic markers, reading sentences containing minimal pairs, spelling aloud, and reading from a table. (Underhill, 1987, p. 76-78)
Sentence transformation is the technique in which the test taker is given a stimulus sentence and is asked to orally transform it into a different grammatical pattern. This technique allows rapid testing of particular structural areas and an estimation of the test taker’s ability to correct himself. (Underhill, 1987, p. 84-85)
Sentence repetition technique is used in a test in which the test taker listens to a set of sentences or utterances, and then repeats them as accurately as possible. The technique may include repeating sentences of increasing length and repetition of sentences with specific language areas. (Underhill, 1987, p. 86-87)
Translating/interpreting technique involves the test taker’s target language translation of a short passage of a native-language familiar text. This technique may have such variations as translating in both directions, translating an unprepared passage, translating test in the language laboratory, and translating disconnected words or phrases. (Underhill, 1987, p. 79-81)
Sentence completion technique is associated with test context in which the test taker is asked to complete a series of sentences with the last few words missing from each. The technique may consist of using written tests, using gapfill to check discourse reference, text completion, using spoken cues, and completing a well-known saying. (Underhill, 1987, p. 81-83)
Sentence correction technique presents the test taker with a sentence containing an error. The test taker’s task is to identify the error and to correct it. The test taker can also be given a chance to correct his own errors. (Underhill, 1987, p. 84)
The four types of oral test/test task combined with different elicitation techniques are summarized in table 2.3 below.
| Test types |
the direct interview type |
The pre-arranged information gap |
Tests where the learner prepares in advance |
Mechanical/entirely predictable tests |
| Elicitation techniques |
-Discussion/conversation-Interview-Form-filling-Question and answer |
-Learner-learner description and re-creation-Picture story-Role-play |
-Oral report-Reading blank dialogue-Re-telling a story |
-Reading aloud-Sentence transformation-Sentence repetition-Translating or interpreting-Sentence completion-Sentence correction |
Table 2.3: Oral Test Types and Elicitation Techniques
No test type as well as no single elicitation technique is said to be the best for an oral test task or an oral test as a whole, for each of them has its own advantages and disadvantages. One elicitation technique may be suitable in a testing situation, but inappropriate in other ones. For example, reading aloud technique may be well used to measure elementary learners’ pronunciation, intonation and stress, but may be improperly used to measure intermediate learners’ speaking ability as this technique is considered to be uncommunicative. Therefore, it is advisable to combine various test types and elicitation techniques in a test of overall oral ability (Underhill, 1987, p. 37-38). This combination depends on the test purpose, and the areas of language competence and ability that intend to be seen in test takers’ performance.
Additionally, oral test tasks ‘differ with regard to whether they call for the use of static relationships, dynamic relationships, or abstract relationships’ (O’Malley & Pierce, 1996, p. 76). These relationships are mentioned in Section 2.1. The selection of oral test types for test tasks is therefore necessarily related to the difficulty degree corresponding to these relationships. Consequently, O’Malley & Pierce (1996, p. 69) propose that the test tasks selected and designed can challenge the language proficiency level(s) of test takers without frustrating them.
2.4.3 Marking Key
During the operationalization process of oral tests, the classification of test takers’ level of language proficiency is to be carefully considered with the purpose of choosing adequate test types and elicitation techniques in which the test takers’ language performance can be best shown. While designing particular test task(s), test writers or teachers should consider and decide how to mark each test task, and therefore build up a guideline helping the assessors to mark each task. This guidline is called a marking key or marking protocol by Underhill (1987).
A marking key is a set of procedures specified in advance that tells assessors what they are supposed to do step by step in the process of marking each test task/question. Test writers can make the marking quicker and more reliable by drawing up a detailed marking guide that tells the marker how to mark each question.
Underhill (1987, p. 95) identifies the aims of a marking key as follows:
- To anticipate problems that the marker is likely to face, and to suggest how to cope with them.
- To maintain the aims of the test by directing the marker’s attention to the language areas that are most important, and by giving general guidelines for dealing with unusual responses.
- To describe the purpose of each question/task.
A marking key revealing such aims thus surely helps to increase the consistency of measurement, that is reliability (See 2.5.2). In fact, oral tests are a kind calling for subjective judgement on the part of assessors, and thus do not have as a high degree of reliability as those that require objective judgement such as multiple-choice or cloze tests with either completely right or completely wrong answers. In order to help assessors achieve the highest possible degree of reliability, it is essential to provide them with a comprehensible marking key conveying the three aims identified by Underhill.
The most important factor concerned in a marking key is the distribution of marks to specific speaking sub-skills that are intended to be measured. These speaking sub-skills are named mark categories by Underhill (1987). The kind of categories defined in a test should be based on the teaching program and be cited by the way in which the teaching syllabus expresses the aims of the program (Underhill, 1987). There are two models that mark categories base on: the traditional model of language components and the more recent model of performance criteria. The former refers to the components of language proficiency (grammar, vocabulary, pronunciation and intonation, style and fluency, content, etc.) while the latter mentions the components of language performance or performance criteria (flexibility, accuracy, appropriacy, independence, hesitation, etc.).
The focus on a number of different language sub-skills or categories can also help to improve marker reliability; the assessor is supposed to give each test taker a separate mark for each category. All these separate marks are then combined to give the overall score, which is related to the process of weighting. In most oral tests or test tasks some categories are more emphasized than others according to the test purpose(s), so a weighting system is used as shown in the following example taken from Underhill (1987, p. 97).
Grammarmarked out of 10 then multiplied by 3
Vocabularymarked out of 10 then multiplied by 3
Pronunciationmarked out of 10 then multiplied by 2
Fluencymarked out of 10 then multiplied by 1
Contentmarked out of 10 then multiplied by 1
Total score 10
In sum, it can be asserted that the marking key plays a very essential role in the design of language tests in general, of oral language tests in particular to ensure the quality of reliability. It must be involved in the whole process of test development from the beginning. Language teachers or test developers should thus take the ways to mark test performance into sound consideration throughout the test construction process.
In the oral test operationalization process, consequently, language teachers or test designers must take great care over not only the selection test types and proper elicitation techniques for the intended test tasks but also the design of a marking key for each test task.
2.5 Qualities of a Good Test
The previous three sections are concerned with the techniques and procedures for developing oral language tests whereas Section 2.5 is related to the qualities of a good test, i.e. whether the test results can reveal test takers’ actual ability to orally use the language. A test used to elicit test takers’ actual language proficiency must reveal such qualities as validity, reliability and practicality.
2.5.1 Validity
Test validity generally is concerned with the degree to which a test actually measures what it is supposed to measure. In other words, it refers to the correspondence between abilities to be assessed and real indication of these abilities in a test, so a test is said to be invalid when there is no relationship between them. The concept of validity includes such detailed aspects as content validity, construct validity and predictive validity. A test is said to have content validity if its content represents a sample of the language skills, structures, etc. with which it is meant to be concerned ( Hughes, 1989). When embarking on the test construction, a test writer should first draw up a table of test specifications, describing in very clear and precise terms the particular language skills and areas to be included in the test. Not less important is the construct validity of a test. A test with construct validity is capable of measuring certain specific characteristics in accordance with a theory of language behaviour and learning. In other words, construct validity ‘examines whether the instrument permit inferences about underlying abilities.’ (Cohen, 1996). According to Hughes (1989), the word ‘construct’ refers to ‘any underlying ability or trait which is hypothesised in a theory of language ability’. This ability or trait is defined by Bachman and Palmer (1996) as ‘the domain of generalisation to which our score interpretations generalize’. Certain learning theories or constructs are believed to underlie the acquisition of abilities and skills. Another approach to test validity is to measure the degree of the agreement between results of the test and those provided by some important task at some future point.
2.5.2 Reliability
If test validity is defined as accuracy of measurement, test reliability is related to consistency of measurement. A reliable test score will be consistent across different characteristics of the testing situation. Unless test scores are relatively consistent, they cannot give any information at all about the ability measured. Another aspect of overall test reliability is rater reliability. Raters must maintain consistency in their own marking standards. This kind of reliability is called intra-marker reliability (Underhill, 1987). Or the same work marked by different raters should produce similar results, which is named inter-marker also by Underhill. If some raters rate more severely than others, the ratings of different raters are not consistent, and the scores obtained could not be considered to be reliable. Oral tests belong to the kind calling for subjective judgement on the part of the marker, so the scores awarded in an oral test cannot be believed to always have such high reliability.
It is also necessary to recognize that inconsistencies cannot be eliminated entirely. Nevertheless, it is possible to minimize the effects of the potential sources of inconsistencies under control in test design (Bachman and Palmer, 1996). Amongst factors affecting test performance, the characteristics of the test tasks are partly under control. In language test design and development, thus, it is possible to minimize variations in the test task characteristics that do not correspond to variations in target language tasks.
Test administration also involved in the concept of reliability has not been given proper attention at some universities at the present time. Administrating a test involves exam invigilators and such test conditions as classrooms, equipment, materials, exam rules and procedures dealing with test takers’ cheating.
2.5.3 Practicality
Test practicality pertains to ‘the ways in which a test will be implemented, and, to a large degree, whether it will be developed and used at all’ (Bachman & Palmer, 1996, p. 35). It concerns practical matters such as the amount of time, human and material resources available for constructing a test, administering it, marking it, and interpreting the results. If the test resources required for implementing a test exceed the resources available, the test will be impractical. Human resources are a crucial component of test construction and administration involving such individuals as test writers, scorers or raters, and test administrators as well as clerical and technical support personnel. In fact, not all institutions have sufficient staff to be in charge of all these well-defined roles. One person may be in charge of several functions. Test writers, key personnel in the process of test development, are involved not only in writing tests but also in collecting materials, editing and recording. Material resources include space (the number of classrooms, language labs needed), equipment (typewriters, computers, cassette players, overhead projectors), test materials (test booklets, answer sheets, audiotapes). Time consists of test development time and the time required to complete the parts of each stage of the test development process.
Moreover, the specific types and amounts of resources required may differ according to the design of a specific test, and available resources may vary from one situation to another. Test practicality thus can only be determined for a specific testing situation. Obviously, to determine the practicality of a given test, test developers must take into account of the resources required to develop a test, and the management and allocation of the resources available.
2.6 Summary
Chapter 2 has considered the main features of oral testing, particularly spoken language and considerations on how to elicit students’ overall speaking ability. Production of spoken language is examined in a continuum of the language functions and a success of meaning negotiation. Next, the communicative approach to assessing production of spoken language is considered one of the best ones. And Bachman & Palmer’s theoretical framework for test development is reviewed as the basis for description and evaluation of current oral testing practices at TNU. Also, this framework is used as the main foundation on which suggestions for improvement of TNU oral testing problems are based. In addition, major considerations in operationalizing speaking tests include a level scale, selection of test types, elicitation techniques, and a marking key. Finally, for a test to be valid, it must also be reliable and practical. Validity is associated with accuracy of measurement, and reliability refers to the consistency of measurement. Practicality, more or less important, concerns the ways in which the test will be implemented in a given situation, or whether the test will be used at all.
CHAPTER 3: methodology
Chapter 2 has reviewed major issues in oral language testing in order to provide an adequate understanding of its theory, which serves as the basis for investigation of the current practices at TNU. In particular, the discussion of Bachman and Palmer’s theoretical framework for developing tests presents the basis for the evaluation of oral testing practices at TNU, and for suggestions for improving its drawbacks. In order to investigate TNU current oral testing practices, the researcher (1) analysed the present process of oral test development at this institution, and (2) surveyed the staff’s perceptions of oral language testing. This chapter consists of three sections. The two research questions are first raised. The second sub-section respectively presents the data collection instruments used to carry out (1) & (2). The other sub-section describes the procedure for conducting the study.
3.1 Research Questions
The study aims at answering the two following questions.
1. What are strengths and weaknesses of the oral test development procedure at TNU?
2. What are the English teaching staff’s perceptions of oral language testing?
3.2 Data Collection Instruments
In order to answer the two research questions, the researcher collected information from two sources: (1) a situation analysis and (2) a questionnaire. Firstly, the situation analysis was carried out with a checklist based on Bachman and Palmer’s framework for test development. To ensure the reliability of the information from the situation analysis, one end-of-term speaking test was observed and tape-recorded. Secondly, the questionnaire was developed to elicit the staff’s perceptions of oral language testing. Therefore, this sub-section respectively describes these three instruments
3.2.1 The Checklist
The situation analysis with a checklist (as seen on page 31) that has been developed based on Bachman and Palmer’s framework for test development reviewed in 2.3 involved such four factors as (1) the test design stage, (2) the test operationalization stage, (3) the test administration stage, and (4) the use of test results.
| 1. Test Design Stage |
| Are the purposes of oral tests explicitly identified? |
|
| Which kind(s) do the oral tests include? Selection PlacementDiagnosisAchievement |
|
| Is a set of the TLU tasks presented? |
|
| Is there an official document including detailed instructions on students’ language proficiency levels? |
|
| Is there an official document including detailed instructions on construct or language ability to be measured? |
|
| Are there any criteria set for evaluating test quality? |
|
| 2. Test Operationalization Stage |
| Are there any official guidelines on the number of test tasks to be included in a particular speaking test? |
|
| Are the specifications of each test task provided? |
|
| 3. Test Administration |
| Are the assessors informed of how to mark the test tasks before the test is administered? |
|
| Is the testing environment well prepared? |
|
| Is a supportive test taking environment maintained? |
|
| Are the instructions for each test task made clear to the students? |
|
| 4. Use of Test ResultsIs the information from test results used for |
| ...grading the students? |
|
| ...evaluation of the effectiveness of instructional programmes? |
|
| ...the teachers’ modification of teaching methods and materials? |
|
3.2.2 The Observation
Observation of a particular achievement speaking test is expected to help collect evidence that supplements the analysis of the current oral test development process, namely the operationalization and administration stages. The observation enables the researcher to directly collect data firsthand, and the data gathered describes the observed phenomena as they take place in their natural settings (Nachmias, 1996, p.206); therefore, this kind of information is surely of great reliability.
The observation focused on the oral test type(s) with underlying elicitation techniques being used (See Appendix 2), time spent on the students’ test performance, and interaction between the assessors and test takers recorded in a tape and then transcribed (See Appendix 3).
3.1.3 The Questionnaire
In order to elicit TNU staff’s perceptions of oral language testing, questionnaires were formulated, and then delivered to the staff members. Questionnaires, in this thesis, are chosen as an adequate way to well elicit respondents’ knowledge, for the respondents are not put under pressure of time, i.e. they answer questions in their own time and at their own pace, and in an anonymous style of responding they undoubtedly feel free and comfortable to answer questions (Gillham, 2000; Nachmias, 1996).
The respondents include 12 English language teachers of TNU, who are involved in the development process of oral tests. They are at the age of 27 to 50. They all had tertiary training in language teaching in different educational institutions throughout Vietnam.
The questionnaire consists of 10 questions, 8 of which are used to elicit the teaching staff’s perceptions of oral testing. These 8 questions are developed based on the theory in oral language testing reviewed in Chapter 2. The other two concern the staff’s working experience, and their qualifications in language testing. The questions are as follows:
Questions 1 and 2 relate to the functions of spoken language and the communicative approach to measuring speaking skill.
Questions 3 and 4 relate to oral test types and elicitation techniques used in a test of speaking ability.
Question 5 relates to grading test tasks and validity.
Question 6 relates to the design and operationalization process of oral tests, and reliability.
Questions 7 and 8 relate to reliability of a test.
In general, questions 1 to 8 are close questions, and questions 9 & 10 are open-ended.
For questions 1 and 2, the respondents are required to arrange the options according to their level of priority. Question 3 requires 1 out of 4 options. Questions 4 and 5 require the respondents to tick the level(s) of language proficiency that the elicitation techniques and particular questions fit. Question 6 and question 8 allow more than one choice. For question 7, the respondents are required to choose 1 out of 3 options.
Questions 9 and 10 are meant to investigate possible sources of the differences in the subjects’ perceptions of oral language testing.
3.3 Procedures
The study meant to investigate TNU current oral language testing practices involves two steps. Firstly, the reseacher analysed the development process of speaking tests at this institution from School Year 1998-1999 (,when the researcher started working at this institution,) up to now. The current practices are evaluated in order to point out the strengths and weaknesses based on Bachman and Palmer’s theoretical framework for test development reviewed in 2.3 (Chapter 2).
To ensure the reliability of the information from the analysis, one end-of-term speaking test was observed and tape-recorded. The information from this observation is particularly meant to confirm the evaluation of the test operationalization and administration stages. The test under observation was used for the second-year students at the end of Term 2 - School Year 2002-2003 (See Appendix 2). This test was administered in the morning on June 25th, 2003 at TNU. 10 students were chosen at random to be audio-recorded. Their performance was recorded in a tape and then transcribed (See the tapescript in Appendix 3).
There were 60 test takers altogether, and they were divided into 3 groups of 20 in 3 separate rooms. However, because of the time limitation and the simultaneity of the 3 groups, only 10 of them were chosen at random for recording. Each group was conducted by 2 assessors. The students drew lots for the test task or topic to prepare for 5 to 10 minutes before performing it. The test consisted of 8 topics altogether (See Appendix 2). Both the assessors and the students were unaware of being recorded.
Secondly, the investigation of TNU current oral testing also involved a survey of the English teaching staff’s perceptions of oral skill assessment. The survey was conducted in the form of questionnaires. The questionnaires were delivered to the respondents and collected one week later. The respondents were clearly informed of the purpose of the questionnaire.
The questionnaire (See Appendix 4) is written in Vietnamese to make sure that the respondents’ different extent of familiarity of some technical terms in language testing does not affect their understanding of the questions and thus distort their responses.
3.4 Summary
Chapter 3 has raised the two research questions and described the research methods employed, particularly the subjects of the study, the data collection instruments used to serve the purposes of the study, and the way the study was conducted. The data collection instruments include a checklist for oral test development summarizing the oral test development process at TNU described, tape recordings of actual test performance used to supplement the operationalization and administration processes of achievement speaking tests, and questionnaires formulated to investigate the staff’s perceptions of oral testing. The data gathered from the study will be thoroughly analysed in the next chapter.
CHAPTER 4: RESULTS AND DISCUSSION
Chapter 3 has outlined the two research methods used to carry out the study: the situation analysis and the questionnaire survey. The situation analysis is meant to evaluate the current oral test development at TNU while the survey is aimed at investigating the staff’s understanding of oral language testing. Chapter 4 is thus divided into two sections evaluating (1) the current development process of speaking tests and (2) the staff’s perceptions of oral skill assessment.
4.1 Evaluation of TNU Current Development Process of Oral Language Tests
As previously mentioned, the analysis of TNU current development process of speaking tests and the observation of one real end-of-term speaking test are intended for the evaluation of the practices of developing speaking tests. This section (1) starts with a detailed review of the present development process of speaking tests summarized in the checklist, (2) presents the results gathered from the observation of the particular achievement speaking test, and (3) analyses the results in order to reach a conclusion of whether TNU current procedures for oral test development are consistent with the theoretical framework or not.
4.1.1 Review of TNU Current Development Process of Oral Language Tests
Information regarding the existing practices of developing English speaking tests at TNU described in this sub-section plays an essential role in the critical evaluation of these practices. TNU current oral test development process is described in relation to such four main factors as (1) the test design stage, (2) the test operationalization stage, (3) the test administration stage, and (4) the use of test results.
First of all, the oral tests used at this institution have been formally administered at the end of each term in order to measure what the students have actually achieved after one particular time of learning. Such kind of test is called a final achievement test by Hughes (1989) and McNamara (2000), progress and grading test by Bachman & Palmer (1996), course test by Davies (2000), or final or attainment test by Heaton (1990). All the teachers or test writers as well as assessors have always known that they should elicit the students’ actual ability to use the language in real communication, and especially their language knowledge and ability they are required to grasp by the end of one particular term. Therefore, oral tests at this institution are explicitly identified as achievement ones from the very start, and both the teachers and the assessors have been aware of this.
However, neither the Department nor the English Section has produced a formal document including explicit classification of students’ language proficiency levels, particular areas of language ability or construct to be assessed and sets of TLU tasks identified for all the levels. Also, they have not established any criteria for test quality evaluation.
Secondly, as regards the operationalization process, the test designers have not kept in hand the official document mentioned above with descriptions of different levels of language proficiency in terms of the students’ language knowledge, i.e. a level scale (discussed in 2.3.1) and with areas of language ability to be tested. Additionally, the teachers or test writers have not received any detailed guidelines or instructions on number of test tasks included in the test(s) as well as test task specifications. Nevertheless, they have been informed about the administration time of the test(s) in advance in order to ensure punctual test production and submission.
Therefore, the teachers or test designers have freely produced the speaking tests in their own way, and most of the oral tests conducted make use of merely one oral test type – Tests where the learner prepares in advance – and one elicitation technique – oral report – and consist of one test task/part (See Appendix 1- these three tests were used for the same class for three terms in succession). Furthermore, as shown in these three tests, none of the test task/question is attached with neither external prompts helping the students make a structured presentation nor explicit instructions quantifying language knowledge and ability needed to perform the task.
Thirdly, concerning the administration process, there has been no detailed guidance in the form of either a meeting among the group of administrators and assessors or an official document regarding the students’ language proficiency level, i.e. level scale mentioned above, and no guidelines on method(s) of marking students’ performance on each test task. In short, the assessors are never informed of or provided with these two important things before test administration.
At the end of one term the oral test administration takes place with three classes of 60 students on average. The test administration for one class is allotted half a day and each class is usually divided into two groups of about 30 students in two separate rooms. The time for test performance of each student is about 4 to 5 minutes, hence. During test administration every 5 students are called into to draw lots for test questions or tasks to make a preparation for 5 to 10 minutes. Then each student presents his/her preparation in front of two assessors. The two assessors often raise some questions related or perhaps unrelated to the student’s presentation. Sometimes the assessors do not ask any questions. The student’s final score is taken from the average of marks given by the two assessors. Meanwhile, the other students waiting for their turn are standing along the corridor and talking, that is to say a supportive testing environment is not maintained.
Last but not least, after all the students finish their performance, test results are analysed to grade the students. However, the teachers are not provided with and are not allowed to keep the list of students’ test scores. The oral test results at this institution are not thus used to either evaluate the effectiveness of the instructional programs or modify the teaching methods and materials.
TNU oral test development described above can be reviewed by means of the checklist (Table 4.1 on page 39), specially designed with the purpose of highlighting the strong and weak points of the current practices. The answer ‘Yes’ is ticked ‘’ and ‘No’ is crossed ‘×’.
| 1. Test Design Stage |
| Are the purposes of oral tests explicitly identified? |
|
| Which kind(s) do the oral tests include? Selection PlacementDiagnosisAchievement |
|
| Is a set of the TLU tasks presented? |
× |
| Is there an official document including detailed instructions on students’ language proficiency levels? |
× |
| Is there an official document including detailed instructions on construct or language ability to be measured? |
× |
| Are there any criteria set for evaluating test quality? |
× |
| 2. Test Operationalization Stage |
| Are there any official guidelines on the number of test tasks to be included in a particular speaking test? |
× |
| Are the specifications of each test task provided? |
× |
| 3. Test Administration |
| Are the assessors informed of how to mark the test tasks before the test is administered? |
× |
| Is the testing environment well prepared? |
× |
| Is a supportive test taking environment maintained? |
× |
| Are the instructions for each test task made clear to the students? |
|
| Are the test tasks in use attached with any limitation of knowledge? |
× |
| 4. Use of Test ResultsIs the information from test results used for |
| ...grading the students? |
|
| ...evaluation of the effectiveness of instructional programmes? |
× |
| ...the teachers’ modification of teaching methods and materials? |
× |
Table 4.1: A checklist for Oral Test Development
Table 4.1 partially speaks for TNU current oral testing practices with many ‘×s’, which reveals impropriety in the speaking test development at this institution and undoubtedly indicates a big gap between practice and theory.
4.1.2 The Observation Results
Table 4.2 displays oral test types used during the administration of the end-of-term speaking test for the second-year students (Term 2 – School Year 2002-2003 – Appendix 2). The information from this table is intended for evaluation of oral test types in use in the next sub-section 4.1.3
| Students |
| Oral test types |
| Direct interview |
Pre-arranged information gap |
Tests where the learner prepares in advance |
Mechanical /entirely predictable tests |
|
| 1 |
|
|
|
|
| 2 |
|
|
|
|
| 3 |
|
|
|
|
| 4 |
|
|
|
|
| 5 |
|
|
|
|
| 6 |
|
|
|
|
| 7 |
|
|
|
|
| 8 |
|
|
|
|
| 9 |
|
|
|
|
| 10 |
|
|
|
|
Table 4.2: Summary of Oral Test Types Used in the Achievement Speaking Test for the Second-Year Students (School Year 2002-2003)
Table 4.2 indicates that only one test type was in use, yet as discussed in 2.4.2 (Chapter 2), a speaking test, namely an achievement one intended to measure overall oral proficiency, that can be believed to be valid should be a combination of various oral test types, at least two.
The following is Table 4.3 presenting elicitation technique(s) employed to elicit the 10 students’ ability during the achievement test mentioned above, their topic number or test question, duration of their test performance, their interaction with the assessors. All these details were recorded and transcribed in Appendix 3.
| Students |
| Elicitation techniques involved in tests where the learner prepares in advance |
| Oral report |
Reading blank dialogue |
Retelling a story |
|
Topic number |
Time(minutes) |
Interaction focus |
| 1 |
|
|
|
8 |
5 |
Student’s presentation without any questions from the assessors |
| 2 |
|
|
|
2 |
4.5 |
Student’s presentation without any questions from the assessors |
| 3 |
|
|
|
1 |
5 |
Student’s presentation with 1 question raised by the assessor |
| 4 |
|
|
|
8 |
3 |
Student’s presentation without any questions from the assessors |
| 5 |
|
|
|
6 |
3 |
Student’s presentation without any questions from the assessors |
| 6 |
|
|
|
2 |
2.5 |
Student’s presentation without any questions from the assessors |
| 7 |
|
|
|
3 |
3.5 |
Student’s presentation without any questions from the assessors |
| 8 |
|
|
|
5 |
2.5 |
Student’s presentation without any questions from the assessors |
| 9 |
|
|
|
1 |
5 |
Student’s presentation with 2 questions raised by the assessor |
| 10 |
|
|
|
4 |
3 |
Student’s presentation without any questions from the assessors |
Table 4.3: Summary of the Students’ Oral Test Performance in the Achievement Speaking Test for the Second-Year Students
As can be seen in table 4.3, oral report, one of the three main elicitation techniques used to elicit test takers’ speaking ability through their performance on this kind of test – Tests where the learner prepares in advance (See 2.4.2.3, Chapter 2), was the only elicitation technique employed throughout this achievement test. Additionally, after most of the students finished their presentation, the assessors did not raise any questions except for students 3 and 9.
4.1.3 Analysis of the Results
The evaluation of TNU current oral testing practices is carried out in relation to four factors described in 4.1.1: (1) test design stage, (2) test operationalization stage, (3) test administration stage, and (4) use of test results.
As can be easily seen in Table 4.1, oral tests are explicitly identified as achievement ones from the very start. Obviously, clear identification of test type at the beginning of a course proves to be beneficial because the teachers can integrate the test content into the teaching program. As pointed out by Brown (1994), Heaton (1990), Hughes (1989) and Ur (1996), achievement tests should be integrated into the teaching program and related directly to the classroom lessons or units, the syllabus or curriculum. Therefore, information or indication of students’ performance on an achievement test reveals their achievement or progress at the end of a course of study (Bachman & Palmer, 1996), and an achievement test of speaking skill is of course a means of eliciting students’ progress in overall speaking ability after a course/term of study.
However, a product of this stage involving such four crucial things as students’ profile of language ability, construct/ability to be measured, sets of test tasks in the TLU domain and a plan for test quality evaluation, as described in 4.1.1, has never been produced and presented to the teachers as a principled basis or guidelines for the other two stages. This undoubtedly indicates that the first stage of oral test development at TNU is far from being consistent with the theoretical framework reviewed in 2.3.1 – Chapter 2. As a result, this big mismatch leads to the staff’s improper practices in the other two stages.
- Test Operationalization Process
Apart from the mismatch between practice and theory at this institution mentioned above, a remarkably essential fact shown in Table 4.1 is that the Department and English Section have not provided any specific guidance, i.e. a blueprint, for speaking test construction process, namely (1) the number of test tasks to be included in a speaking test, and (2) specifications of each test task. These two factors are critically analysed respectively.
Firstly, as previously discussed, an achievement test of speaking skill is a means of eliciting students’ progress in overall speaking ability after a course of study, yet most of the achievement speaking tests in use at TNU can be asserted to fail to serve this purpose because they make use of merely one type of oral test or one test task – Tests where the learner prepares in advance (Tables 4.2) - combined with only one elicitation technique - Oral Report (Table 4.3). This is partially because no blueprint is presented. Underhill (1987) points out that an oral test rarely consists of only one elicitation technique but it is usual that it involves several techniques placed in a sequence. The reasons he provides for including more than one technique in an oral test are as follows
- It is more authentic to use a mix of techniques, with the learner doing different things with the language....
- An oral test that consists only of Question and Answer, for example, will naturally favour learners who are good at answering questions....
- To help improve the consistency of assessment, a change of tasks during a test can be used as an opportunity to swap interviewers and so combine multiple tasks with multiple assessment....
- A live test with several different parts is more flexible and can be adapted quickly to meet changing circumstances or different needs....
(Underhill, 1987, p.38)
Probably, such test tasks have been carefully discussed in class, and the students are expected to produce ‘well-prepared’ talk, even predictable questions can also be prepared in advance. Of course, ‘the task(s) on which the student has to perform may be generally familiar in form to the student, but the student cannot ‘prepare’ a written version of what he will say’ (Brown & Yule, 1983, p.120). He must prove to the assessors that in his test performance he has learned to use, not to repeat, what he has been taught. What we as examiners want to know when testing a students is not whether the students has learned what to have been taught, but whether he is able to produce an extended piece of spoken English appropriate to the communicative situation he encounters (Brown & Yule, 1983, p.120).
Obviously, this popular kind of oral test at TNU is far from being useful in measuring the students’ overall language oral proficiency, and can be said to be lacking in construct validity and reliability (See 2.5, Chapter 2).
Secondly, no specifications of particular test task(s), especially specified components of oral ability to be tested, areas of language knowledge adequate and a marking key, to some extent, results in the teachers’ or test designers’ inadequate and useless tests. It can be said that there is lack of consideration of communicative stress in the oral test construction.
As can be seen in four achievement speaking tests (See Appendices 1 & 2), all the test questions/tasks – topics- are never accompanied with any external prompts helping the students make a structured presentation, and any explicit instructions quantifying language knowledge and ability needed to perform the tasks.
It is extremely necessary for test writers to provide clear instructions helping test takers to organise a spoken presentation for test performance because students are always encouraged to produce effectively organised speech so that the listener finds it easy to catch up with what is being said (Brown & Yule, 1983, p.119).
Also, in order to write test tasks fitting students’ proficiency levels, test writers need really give explicit instructions quantifying language knowledge and ability. The quantification of performance on a particular task much depends on the grading of tasks according to cognitive difficulty (Brown & Yule, 1983, p.121). To put in another way, the same task type can be made easier or more difficult. For example, describing a room with 8 elements is apparently more difficult than a room with 5 elements. Inevitably, test designers or teachers of speaking skill should always bear in mind informed judgements of the degree of this cognitive difficulty or communicative stress (Figure 2.2, Chapter 2) during test operationalization process.
Besides, no official instructions on criteria for marking students’ test performance are presented; thus, the test writers/teachers are unaware of the importance of scoring method(s) for each test task, and they never design a marking key (See 2.4.3, Chapter 2) instructing assessors how to assess students’ performance on test tasks. As discussed in 2.4.3, in a marking key, language and skill categories are identified and awarded separate marks according to test purpose(s). As Underhill (1987, p.94) points out the aim of a marking key is ‘to save time and uncertainty by specifying in advance, as far as possible, how markers should approach the marking of each question or task’. With help of a marking key and a level scale mentioned above, assessors can mark a test more quickly and reliably, for each language or skill category is expected to be separately marked.
- Test Administration Process
Table 4.1 and 4.3 indicate that TNU speaking test administration reveals many a shortcoming. These weak points include (1) lack of test admi