Note: The following is based upon a portion of an earlier work.
For the complete version of the original publication, see:

Downing, L. L. (1994).  Criterion shaped behaviour: Pitfalls of performance appraisal.
International Journal of Selection and Assessment, 2, 1-21.

PART 4.

IMPLICATIONS FOR THE EVALUATION OF TEACHERS




        In this paper I have attempted to develop a general theory of Criterion Shaped Behavior, to show how this theory can be used to better understand the consequences of imposing evaluation systems on performance behavior of those being evaluated.  My emphasis has been on describing the potential for undesirable consequences to follow from the implementation of some evaluation programs, such as are currently common, and from some recently suggested by individuals and groups who are genuinely interested in improving the state of education in the United States today.  My concern is that in the rush to improve the system, by more thorough evaluation of teacher performance, and by more explicit consequences made contingent upon successful and unsuccessful performance, such changes could, unwittingly, make things worse rather than better.
        It is not my position that evaluation and contingent reward and punishment for good and bad performance cannot, or should not, be implemented.  My hope is that where such programs are established, they are done so wisely.  With attention paid to the potential pitfalls of such systems many of the adverse consequences can be minimized or avoided.  Without such awareness they most certainly will not.  In the following section I outline some of the major points to which designers of evaluation systems are encouraged to attend.

Implication of the Universal Theory of Performance.

        A sensitivity to issues addressed by the theory can increase one's ability to design an evaluation and compensation system capable of minimizing problems previously discussed.  Let us begin with the summary statement of the determinants of desirable behavior change.  This Probability of Performance, Pr(P), is a positive function of Motivation to Perform, Mp, but Motivation only yields Performance to the extent that Ability to Perform, Ap, is high, and Constraints inhibiting Performance, Cp, are low.  This set of relationships is formally presented in Appendix 1, as Statement 15:  Pr(P) = f[(Mp)X(Ap - Cp)].
        Having proposed a new system, the designer should reason as follows.  If we assume, for the moment, that the numerous factors combining to yield motivation (Mp) have in fact adequately done so, we must concern ourselves with issues of Ability to perform the desired behaviors, (Ap), and with whether or not environmental Constraints, (Cp), are likely to interfere with performance.  Suppose, for example, that a high school mathematics teacher's score on the Actual Criterion Measure envisioned would be determined, in part, by assessment of his or her students' ability to write a simple computer program to solve an equation.  No amount of teacher motivation is likely to be effective here if the teacher lacks sufficient knowledge of computer programming to teach it to the students.  The system designer will not expect a merit system to increase teacher performance under such circumstances.  What might be proposed is that means be made available for the teacher to take coursework in computer programming, with sufficient monetary or release time incentives extended to make this possible.
        Returning to our theory, we will ask ourselves what role environmental Constraint, (Cp), may be playing in our example.  One will quickly realize that even given Motivation and Ability, that the system will not improve teacher performance unless there is sufficient access in the classroom to computers for student use.  Lacking such computers, investment of money in a merit system, or in teacher training, would be totally ineffective.  Clearly, then, if there is money available it should be spent on the purchase of the necessary computers.  Many other possible Constraints may need to be overcome, such as inadequate prior training of students, unavailability of class time for computer instruction, inflexible curriculum requirements, and so on.  Until these are adequately addressed, a merit system, the effects of which, if it has any, will be on teacher motivation, is doomed to failure.
        Let us assume adequate Ability, and the absence of Constraints.  Now the system designer must be sensitive to the determinants of Performance Motivation (Mp), which is determined by the extent to which Subjective Expected Utility of Effort to Perform is greater than that of Effort directed to the best alternative Non-Performance.  Formally this is Statement 14:  Mp = f(SEUep - SEUenp).  There exist two distinguishable categories of Non-Performance behavior.  One category includes all behavior not directed to increasing scores on the Actual Criterion Measure.  The other includes all behavior directed to increases in such scores that fail to increase Ultimate Criterion Factors, i.e., Contamination CSBs, and Deficiency CSBs.
        Starting with the Universal Theory of Performance, Statement 14, we focus our attention first on SEUep, for which we must go to Statement 9: SEUep = Subjective Expected Utility of Ep = sum over all values from i=1 to i=n of the product [Pr(P!Ep)]X[SEUp] minus (cost) of Effort (Uep).  This means essentially the perceived probability that effort will indeed result in the desired performance, minus the cost of that effort.  For the new incentive system to be effective, the teacher must place high value on the expected outcome of desired performance, must believe that that outcome will indeed be contingent upon performance, and must believe that he or she is capable of such performance through the exertion of sufficient effort.  The perception of constraints or of lack of ability would reduce the motivation to even exert an effort to improve performance.  If this perception is valid, then as previously discussed, even the exertion of effort will be ineffective at improving performance.
        The contingent outcome of our proposed merit system must be sufficiently valued by the teacher.  If it is money, as is usually the case, we must ask if a given teacher places sufficient value on money.  One might guess that if money were the primary motivator for this person that he or she would not have chosen to become a teacher in the first place.  Even granted that money is valued, the amounts that are involved may be too small to yield a very high Subjective Expected Utility for Performance Behavior.  It may require hundreds of hours of extra work to achieve a bonus of $200.  Likewise, if the amount is large but the perceived probability of its actually resulting from the behavior is small, the SEUp will be small, and consequent motivation to perform will be small.  In a school system, these perceptions of contingencies may reflect trust or confidence in the goodwill or good judgment of one's administrative superiors.  These qualities are unfortunately not at very high levels in many school systems.
        Now, let us assume all of the following:  high ability, low constraints, high value for the outcome of performance, and high perceived probability that the outcome will in fact follow performance, and that effort will lead to such performance.  We must now ask about the Subjective Expected Value of Effort directed at Non-Performance behavior that is incompatible with Performance, SEUenp. If a teacher perceives that valued outcomes can be more efficiently achieved by exerting Non-Performance Effort than by exerting Performance Effort, then our theory predicts a low level of motivation to perform (see Statement 14).
        The role of available alternative behaviors is critically important according to the theory, for the motivation to do anything must always be assessed relative to the motivation to do something else.  In the field of education, especially for women and for members of minority groups, the availability of potentially rewarding alternatives has increased in recent years.  While women of 20 years ago may have been adequately motivated to teach for low levels of valued outcomes, it was so because attractive alternative roles were rarely available to them.  The math teacher in our example who has learned computer programming will very likely find places outside of teaching that will pay considerably more money, and that will perhaps at the same time offer higher professional status, better working conditions, and maybe even shorter hours.  Teachers who possess all of the skills and abilities necessary to be effective teachers will have more such alternatives than will unskilled and ineffective teachers.  Even if the effective teacher stays in the system, he or she will have increasing opportunities for part-time, or for summer employment outside of the system, and will be less able to utilize such times in the service of becoming a more effective teacher.  Small and uncertain monetary incentives based upon meritorious performance are unlikely to have much effect for such individuals.

Implication from the Theory of Criterion Shaped Behavior.

        The designer of a performance evaluation system is encouraged in developing Actual Criterion Measures to take into account potentially undesirable consequences, Contamination CSBs and Deficiency CSBs, as well as potentially desirable Relevant CSBs.  This should be done for any performance evaluation instrument, whether its "intended" use is for feedback, for assessment, or for shaping of desirable behaviors.  The Universal Theory of Performance addressed the issue of whether or not an individual is motivated to engage in behaviors directed toward increasing scores on the Actual Criterion Measure.  If we assume that the basic requirements for such motivation have been met, then we should address the issue of what specific changes in behavior are likely to result from motivation to increase one's score.  CSB Theory describes three categories of such behavior, desirable Relevant CSBs, and undesirable Contamination and Deficiency CSBs.  A designer of a new evaluation and compensation system wants to increase only the desirable behaviors, and should be sensitive to system characteristics that will promote or interfere with that objective.
        Validity of the Actual Criterion Measure is the primary consideration for implementation of a new evaluation system.  A perfectly valid measure would assess all factors of the Ultimate Criterion, and no other factors.  Invalidity relates to problems of Deficiency and of Contamination.  We will address these one at a time.

        Criterion Deficiency.  To anticipate problems resulting from Criterion Deficiency, one must ask about the extent to which desirable performance factors are excluded from the Actual Criterion Measure, and evaluate the likelihood and importance of potential decreases in such factors.  The theory states that such decreases will occur in unmeasured desirable behavior where such behavior is incompatible with the performance of Relevant and of Contamination CSBs, both of which might be expected to increase.  The designer should be encouraged to develop a list of factors in the Ultimate Criterion, and a list of factors in the Actual Criterion Measure.  A list of those factors in the Ultimate Criterion that are not in the Actual Criterion can then be deduced.  One should expect reduction in these factors to the extent that incompatible Relevant and Contamination CSBs are expected to increase.

If important desirable factors are expected to decrease given such an analysis, the designer must consider the following options:

        1.  Accept such reduction in desirable factors on the grounds that anticipated increases in other desirable behaviors
            (Relevant CSBs) are more important, and are worth the expected loss.
        2.  Devise a new or revised Actual Criterion Measure that assesses more of the Ultimate Criterion factors, and is
            thus less subject to Deficiency CSBs.
        3. Decide to not implement the new system on the grounds that it may produce too many undesirable effects to be
            offset by anticipated positive ones.

        These are not easy choices to make, for many of the necessary pieces of information will only be knowable through subjective estimation.  One must nevertheless make the choice from the alternatives listed.  To ignore problems of Criterion Deficiency leads one to the first option by default.  To overestimate the undesirabe effects leads one to option three, not implementing the system at all.  Option two is most attractive where a better measure can in fact be developed at low cost and in a reasonable period of time.  Unfortunately, time and money will often be insurmountable barriers to improvement in existing measures.
        A further problem is that changes in the measure that reduce Deficiency CSBs may unintentionally increase Contamination CSBs.  Where broad subjective ratings replace narrow objective measures for purposes of reducing Criterion Deficiency, it can be expected that problems of Criterion Contamination will increase.

        Criterion Contamination.  Any Actual Criterion Measure, the score of which can be influenced by factors not part of the Ultimate Criterion, is potentially susceptible to Contamination CSBs.  These range from cheating to brown-nosing, and in addition to reducing Ultimate Criterion Factors, will frequently bring with them additional undesirable effects, many of which have been previously discussed.

A designer of a new system is encouraged to straightforwardly address these possible effects, and to then choose from the following alternatives:

        1. Proceed to implement the system in spite of such Contamination effects on the grounds that they are not so
            negative as to outweigh anticipated positive effects, i.e., increases in Relevant CSBs.
        2. Develop a new or revised Actual Criterion Measure that is less susceptible to Contamination effects.
        3. Decide not to implement the system on the grounds that it may produce more undesirable than desirable effects.

        As with the options listed for problems of Criterion Deficiency, choices related to Criterion Contamination will very likely be difficult ones.  To ignore Contamination problems is to choose option 1 by default.  To overestimate their magnitude is to unnecessarily decide to not implement the new system.  To choose option 2 requires time and money to develop a new or revised Actual Criterion Measure less subject to problems of Contamination.  It may be difficult, or even impossible given one's resources, to reduce problems of Criterion Contamination without simultaneously increasing problems of Criterion Deficiency.  This tradeoff is the same as previously discussed between broad subjective measures, which are likely to be low in Criterion Deficiency but high in Criterion Contamination; and narrow objective measures, that are likely to be low in Criterion Contamination, but high in Criterion Deficiency.
        Deciding to improve the Actual Criterion Measure to reduce both Contamination and Deficiency problems at the same time is the ideal, but the ideal may be impossible to achieve.  The goal is to develop a measure that is both broad enough to cover most factors of the Ultimate Criterion, yet is objective enough to avoid most of the problems that can result from subjective ratings.  This will be most feasible for evaluation of performances where the factors in the Ultimate Criterion are relatively few, and are relatively concrete, as may be the case for performance of used car salesmen.  The task becomes more difficult as the desired performances become more complex and more abstract, as is likely to be true in the evaluation of students, teachers, administrators, doctors, artists, and many others.  With hard work, creativity, and sensitivity to the issues raised by the Universal Theory of Performance and the Theory of Criterion Shaped Behavior, it may become possible to evaluate even these performances in such a manner that the overall behavioral consequences of the appraisal system are more beneficial than they are harmful to organizational objectives.
 

REFERENCES
RETURN TO MAIN PAGE