Note: The following is based upon a portion of an earlier work.
For the complete version of the original publication, see:

Downing, L. L. (1994).  Criterion shaped behaviour: Pitfalls of performance appraisal.
International Journal of Selection and Assessment, 2, 1-21.

PART 1

INTRODUCTION

Criterion Shaped Behavior:
 Pitfalls of Performance Appraisal

 Leslie L. Downing
 State University of New York
 College at Oneonta

         One of the major issues of recent years in this country has been labeled the "Crisis in Education."   In September, 1989, a "Summit Meeting" of State Governors and Federal Government officials met at the University of Virginia to develop a common resolve and a tentative plan for coping with long-standing problems in the field of education.  One recurrent theme of that meeting was called "accountability," pertaining to which they said, in the closing section of the official statement released at the end of the meeting, "When goals are set and strategies for achieving them are adopted, we must establish clear measures of performance and then issue annual Report Cards on the progress of students, schools, the states, and the Federal Government," (New York Times, 1989).  An earlier comment advocated a "...system that develops first-rate teachers and creates a professional environment that provides real rewards for success with students, real consequences for failure, and the tools and flexibility to get the job done."  This is tough talk, but it is talk that reflects easy to agree with sentiments.  It expresses a view, however, that too readily accepts the validity of several basic assumptions, many of which are so firmly believed that their validity is not questioned, and some of which, this paper will argue, are insupportable upon closer examination.

Three of the basic, yet unstated, assumptions inherent in the Governors' report are as follows:

        1.  Evaluations of the performances of students, teachers, schools, states, or of the entire country, can be accomplished readily, and with reliability and validity commensurate with the major purposes to which these evaluations will be put.

        2.  By rewarding students, teachers, schools, etc., for high scores, or punishing them for low scores, on these yet to be developed measures of performance, they will inevitably change their behavior in ways that contribute to more desirable educational outcomes.

        3.  It is always better to evaluate than not to evaluate.  The report of the Governors (New York Times, 1989) reflects the view that many desirable consequences can follow from evaluation of performance, and totally ignores the possibility that performance evaluation can, under some circumstances, result in unintended and undesirable effects.

        There are numerous problems to be faced when developing a system for performance evaluation, and whether or not it is feasible to design measures that are adequate for their intended purposes is dependent upon many considerations.  Furthermore, the behavioral consequences to be expected from making valued outcomes contingent upon performance scores are not always positive or desirable ones.  In some instances, given the limitations of the evaluation systems available for use, it may be preferable not to measure performance at all.
        This paper attempts to clarify some of these very important issues, and to develop a theoretical framework in the context of which one can make informed decisions about the likely consequences of imposing various methods of performance evaluation on the behavior of the individuals or institutions being evaluated.  Application of the theory can facilitate the design of systems of evaluation that are less subject to the numerous pitfalls common to those currently in use, or to those systems otherwise likely to be implemented in the future.
       While the examples and illustrations of concepts to be used come primarily from the field of education, the theory presented and the major implications derived from it, will be equally applicable to other institutional contexts where evaluation of performance is used as a basis for allocating valued outcomes.

Functions of Performance Evaluation

        Evaluation of performance can potentially serve any one or more of several different functions.  The major uses are for Assessment, for provision of Feedback, and to produce Criterion Shaped Behavior.  Adequate design of performance evaluation systems must take into account the conflicting goals and requirements of these functions.

         Assessment.  Traditional measurement theory, as covered in general texts (c.f. Landy and Trumbo, 1978), is concerned almost exclusively with the assessment function of testing.  Assessment need not imply any contingent rewards or punishments for high or low scores, either intrinsic or extrinsic, nor is it intended to have an impact upon the level of performance of the individuals or institutions being evaluated.  Those being evaluated are not expected to react to the presence of the evaluation system, but to passively continue to behave as though that system had never been enacted.  Most texts fail to even acknowledge that such behavioral consequences occur, or to offer a language or a system for understanding such effects.  Those effects that are noted are labeled "reactance," and are discussed as sources of invalidity of the measuring instrument.  To reduce this reactance, ideally, assessment can be done without those whose performance is being evaluated even knowing that measurement is taking place.  Such "unobtrusive measurement," (Webb, Campbell, Schwartz, and Sechrest, 1966), is the ideal for purposes of research or program evlauation, with respect to which measurement is a tool used to ascertain the existence or the magnitude of influences on performance.  In the field of education, one example of a performance measure that is designed exclusively for assessment purposes is the National Assessment of Academic Progress, the NAEP (see Koretz, 1989).  It is a test given to a random sample of school children in the country from different grade levels, at regular intervals, to track the rate of increase or decrease in student academic achievement over the course of many years.  Average scores of students at different grade levels, in different academic areas, are reported as evidence of the effectiveness of education in the country.
         Evaluation of teacher performance may be done to serve such assessment functions, perhaps to evaluate the adequacy of a new method for training teachers.  Most often, however, evaluation of teachers is done with the explicit intention of influencing their behavior, i.e., of shaping improvements in their performance.  There are two related but separate functions of performance evaluation each of which is intended to produce such behavior changes.

         Feedback.  For the feedback function, evaluation may be a very narrowly defined, behavior specific, indicator of right or wrong, good or bad, better or worse, and it may be made known to only the performer, with no outside evaluator involved subsequent to design and implementation of the system.  It may also be unrelated to extrinsic rewards or costs, depending for its effectiveness upon the intrinsically rewarding satisfaction of having performed well.  Many who find fault with standardized testing see such feedback as essential for improving performance.  Such feedback is basic to teaching machines and to individualized programmed instruction of students (Skinner, 1984).  Even tests, in certain applications, can be viewed as instruments of feedback.  Frederiksen (1984) has suggested that innovative tests might be used to provide feedback to students in teaching skills not ordinarily evaluated at all, such as solving of unstructured problems.  The goal of feedback is the shaping of desired performance through the immediate receipt of intrinsic rewards contingent upon success.  Feedback is the means through which video games captivate players of all ages.  Where skillfully employed, feedback of performance evaluations can produce high levels of motivation and of desirable performance behavior.
        For teachers, feedback may occur in the daily receipt of information concerning whether or not the students are learning.  This may be from grades on homework assignments, or from less structured sources, such as looks of perplexity or insight on students' faces.  A teacher who is intrinsically rewarded by feedback indicative of having taught well may respond with changes in behavior designed to further increase the ratio of successes to failures.
        Note the differences between performance evaluations likely to be most effective for purposes of assessment, compared to those most effective for purposes of feedback.  Assessment works best when the performer does not react at all to its implementation.  It involves no reward, intrinsic or extrinsic, for good versus bad performance.  It is often best accomplished where the performer fails even to be aware that evaluation is occurring.  It happens infrequently, perhaps once a year, for purposes of assessing the performance levels of a program, or of a school or school system.  It may only be done on a single occasion, as part of a research project to evaluate the effectiveness of a new selection procedure, or training method, or environmental improvement.  Results of such assessment need to be known to the organization, but to avoid reactance, should not be known by the performer.
        Feedback, on the other hand, is supposed to be reactive.  Its goal is to alter the behavior of the individual whose performance is being measured.  To do so, results must be made known to the performer, but not to the organization.  Rewards for successful performance are essential for feedback to work, but the rewards are most likely to be intrinsic ones, such as the satisfaction experienced at being correct, or of having demonstrated competence.  And, these rewards to be effective should occur at short delays, and on a frequent basis.  It is also usually best if feedback information is not made available to the performer's supervisor or organization, for to do so opens up the possibility of extrinsic rewards or punishments administered from such sources, which may conflict or interfere with the intrinsic reward processes basic to the effectiveness of the feedback function.

        Criterion Shaped Behavior.  A third function of performance evaluation, like assessment, uses infrequently given criterion measures of performance, but unlike assessment, and like feedback, is intended to be reactive.  It depends for its effectiveness upon extrinsic rewards, and of necessity involves long delays between the performance and the reward.  This function requires that both the performer, and the organization, be knowledgeable of the evaluation system and of the evaluation achieved by the individual performer.  The performer needs to know in order to understand what has been rewarded, so motivation to do more of the same will facilitate improved performance.  The organization needs to know in order to know who to reward and who not to reward.  In practice, such as in "merit money" programs, such evaluations are infrequently done, and of necessity, knowledge of results is long delayed.  It is the use of performance evaluation for these behavior shaping functions that is the major concern of this paper.

        These three functions of performance evaluation are quite different from each other, not only in their intended effects, but in the characteristics of the evaluation system most likely to facilitate those effects.  It is essential that the intended functions be clearly thought out, for a system most likely to be effective for one purpose may be woefully lacking for another. Performance evaluations of students, of teachers, or even of institutions, whether explicitly used to shape behavior change or not, will nearly always do so.  If they know, or even guess, that their performance is being evaluated, they will find out, or guess at, the nature of the actual criterion measure being used.  If they then discover, or assume, that outcomes they value will be made contingent upon scores obtained on that measure, and if they can devise behaviors believed to increase those scores, they will increase their tendency to engage in whichever of those behaviors they believe will increase those scores with the least amount of effort.  It is this effect, behavior change that results from efforts to increase scores on some Actual Criterion Measure of performance, that this paper calls "Criterion Shaped Behavior."

        Being mindful of the assessment, feedback and criterion shaped behavior functions of performance evaluation is necessary if one is to avoid unintended and undesirable behavioral consequences.

REFERENCES
LINK TO PART 2
RETURN TO MAIN PAGE