Downing, L. L. (1994). Criterion shaped behaviour: Pitfalls
of performance appraisal.
International Journal of Selection and Assessment, 2,
1-21.
PART 1
Criterion Shaped Behavior:
Pitfalls of Performance Appraisal
Leslie L. Downing
State University of New York
College at Oneonta
One of the major issues of recent years in this country has been labeled the "Crisis in Education." In September, 1989, a "Summit Meeting" of State Governors and Federal Government officials met at the University of Virginia to develop a common resolve and a tentative plan for coping with long-standing problems in the field of education. One recurrent theme of that meeting was called "accountability," pertaining to which they said, in the closing section of the official statement released at the end of the meeting, "When goals are set and strategies for achieving them are adopted, we must establish clear measures of performance and then issue annual Report Cards on the progress of students, schools, the states, and the Federal Government," (New York Times, 1989). An earlier comment advocated a "...system that develops first-rate teachers and creates a professional environment that provides real rewards for success with students, real consequences for failure, and the tools and flexibility to get the job done." This is tough talk, but it is talk that reflects easy to agree with sentiments. It expresses a view, however, that too readily accepts the validity of several basic assumptions, many of which are so firmly believed that their validity is not questioned, and some of which, this paper will argue, are insupportable upon closer examination.
Three of the basic, yet unstated, assumptions inherent in the Governors' report are as follows:
1. Evaluations of the performances of students, teachers, schools, states, or of the entire country, can be accomplished readily, and with reliability and validity commensurate with the major purposes to which these evaluations will be put.
2. By rewarding students, teachers, schools, etc., for high scores, or punishing them for low scores, on these yet to be developed measures of performance, they will inevitably change their behavior in ways that contribute to more desirable educational outcomes.
3. It is always better to evaluate than not to evaluate. The report of the Governors (New York Times, 1989) reflects the view that many desirable consequences can follow from evaluation of performance, and totally ignores the possibility that performance evaluation can, under some circumstances, result in unintended and undesirable effects.
There are numerous problems
to be faced when developing a system for performance evaluation, and whether
or not it is feasible to design measures that are adequate for their intended
purposes is dependent upon many considerations. Furthermore, the
behavioral consequences to be expected from making valued outcomes contingent
upon performance scores are not always positive or desirable ones.
In some instances, given the limitations of the evaluation systems available
for use, it may be preferable not to measure performance at all.
This paper attempts to clarify
some of these very important issues, and to develop a theoretical framework
in the context of which one can make informed decisions about the likely
consequences of imposing various methods of performance evaluation on the
behavior of the individuals or institutions being evaluated. Application
of the theory can facilitate the design of systems of evaluation that are
less subject to the numerous pitfalls common to those currently in use,
or to those systems otherwise likely to be implemented in the future.
While the examples and illustrations
of concepts to be used come primarily from the field of education, the
theory presented and the major implications derived from it, will be equally
applicable to other institutional contexts where evaluation of performance
is used as a basis for allocating valued outcomes.
Functions of Performance Evaluation
Evaluation of performance can potentially serve any one or more of several different functions. The major uses are for Assessment, for provision of Feedback, and to produce Criterion Shaped Behavior. Adequate design of performance evaluation systems must take into account the conflicting goals and requirements of these functions.
Assessment.
Traditional measurement theory, as covered in general texts (c.f. Landy
and Trumbo, 1978), is concerned almost exclusively with the assessment
function of testing. Assessment need not imply any contingent rewards
or punishments for high or low scores, either intrinsic or extrinsic, nor
is it intended to have an impact upon the level of performance of the individuals
or institutions being evaluated. Those being evaluated are not expected
to react to the presence of the evaluation system, but to passively continue
to behave as though that system had never been enacted. Most texts
fail to even acknowledge that such behavioral consequences occur, or to
offer a language or a system for understanding such effects. Those
effects that are noted are labeled "reactance," and are discussed as sources
of invalidity of the measuring instrument. To reduce this reactance,
ideally, assessment can be done without those whose performance is being
evaluated even knowing that measurement is taking place. Such "unobtrusive
measurement," (Webb, Campbell, Schwartz, and Sechrest, 1966), is the ideal
for purposes of research or program evlauation, with respect to which measurement
is a tool used to ascertain the existence or the magnitude of influences
on performance. In the field of education, one example of a performance
measure that is designed exclusively for assessment purposes is the National
Assessment of Academic Progress, the NAEP (see Koretz, 1989). It
is a test given to a random sample of school children in the country from
different grade levels, at regular intervals, to track the rate of increase
or decrease in student academic achievement over the course of many years.
Average scores of students at different grade levels, in different academic
areas, are reported as evidence of the effectiveness of education in the
country.
Evaluation of teacher
performance may be done to serve such assessment functions, perhaps to
evaluate the adequacy of a new method for training teachers. Most
often, however, evaluation of teachers is done with the explicit intention
of influencing their behavior, i.e., of shaping improvements in their performance.
There are two related but separate functions of performance evaluation
each of which is intended to produce such behavior changes.
Feedback.
For the feedback function, evaluation may be a very narrowly defined, behavior
specific, indicator of right or wrong, good or bad, better or worse, and
it may be made known to only the performer, with no outside evaluator involved
subsequent to design and implementation of the system. It may also
be unrelated to extrinsic rewards or costs, depending for its effectiveness
upon the intrinsically rewarding satisfaction of having performed well.
Many who find fault with standardized testing see such feedback as essential
for improving performance. Such feedback is basic to teaching machines
and to individualized programmed instruction of students (Skinner, 1984).
Even tests, in certain applications, can be viewed as instruments of feedback.
Frederiksen (1984) has suggested that innovative tests might be used to
provide feedback to students in teaching skills not ordinarily evaluated
at all, such as solving of unstructured problems. The goal of feedback
is the shaping of desired performance through the immediate receipt of
intrinsic rewards contingent upon success. Feedback is the means
through which video games captivate players of all ages. Where skillfully
employed, feedback of performance evaluations can produce high levels of
motivation and of desirable performance behavior.
For teachers, feedback may
occur in the daily receipt of information concerning whether or not the
students are learning. This may be from grades on homework assignments,
or from less structured sources, such as looks of perplexity or insight
on students' faces. A teacher who is intrinsically rewarded by feedback
indicative of having taught well may respond with changes in behavior designed
to further increase the ratio of successes to failures.
Note the differences between
performance evaluations likely to be most effective for purposes of assessment,
compared to those most effective for purposes of feedback. Assessment
works best when the performer does not react at all to its implementation.
It involves no reward, intrinsic or extrinsic, for good versus bad performance.
It is often best accomplished where the performer fails even to be aware
that evaluation is occurring. It happens infrequently, perhaps once
a year, for purposes of assessing the performance levels of a program,
or of a school or school system. It may only be done on a single
occasion, as part of a research project to evaluate the effectiveness of
a new selection procedure, or training method, or environmental improvement.
Results of such assessment need to be known to the organization, but to
avoid reactance, should not be known by the performer.
Feedback, on the other hand,
is supposed to be reactive. Its goal is to alter the behavior of
the individual whose performance is being measured. To do so, results
must be made known to the performer, but not to the organization.
Rewards for successful performance are essential for feedback to work,
but the rewards are most likely to be intrinsic ones, such as the satisfaction
experienced at being correct, or of having demonstrated competence.
And, these rewards to be effective should occur at short delays, and on
a frequent basis. It is also usually best if feedback information
is not made available to the performer's supervisor or organization, for
to do so opens up the possibility of extrinsic rewards or punishments administered
from such sources, which may conflict or interfere with the intrinsic reward
processes basic to the effectiveness of the feedback function.
Criterion Shaped Behavior. A third function of performance evaluation, like assessment, uses infrequently given criterion measures of performance, but unlike assessment, and like feedback, is intended to be reactive. It depends for its effectiveness upon extrinsic rewards, and of necessity involves long delays between the performance and the reward. This function requires that both the performer, and the organization, be knowledgeable of the evaluation system and of the evaluation achieved by the individual performer. The performer needs to know in order to understand what has been rewarded, so motivation to do more of the same will facilitate improved performance. The organization needs to know in order to know who to reward and who not to reward. In practice, such as in "merit money" programs, such evaluations are infrequently done, and of necessity, knowledge of results is long delayed. It is the use of performance evaluation for these behavior shaping functions that is the major concern of this paper.
These three functions of performance evaluation are quite different from each other, not only in their intended effects, but in the characteristics of the evaluation system most likely to facilitate those effects. It is essential that the intended functions be clearly thought out, for a system most likely to be effective for one purpose may be woefully lacking for another. Performance evaluations of students, of teachers, or even of institutions, whether explicitly used to shape behavior change or not, will nearly always do so. If they know, or even guess, that their performance is being evaluated, they will find out, or guess at, the nature of the actual criterion measure being used. If they then discover, or assume, that outcomes they value will be made contingent upon scores obtained on that measure, and if they can devise behaviors believed to increase those scores, they will increase their tendency to engage in whichever of those behaviors they believe will increase those scores with the least amount of effort. It is this effect, behavior change that results from efforts to increase scores on some Actual Criterion Measure of performance, that this paper calls "Criterion Shaped Behavior."
Being mindful of the assessment, feedback and criterion shaped behavior functions of performance evaluation is necessary if one is to avoid unintended and undesirable behavioral consequences.