Downing, L. L. (1994). Criterion shaped behaviour: Pitfalls
of performance appraisal.
International Journal of Selection and Assessment, 2,
1-21.
PART 4.
IMPLICATIONS FOR THE EVALUATION OF TEACHERS
In this paper I have attempted
to develop a general theory of Criterion Shaped Behavior, to show how this
theory can be used to better understand the consequences of imposing evaluation
systems on performance behavior of those being evaluated. My emphasis
has been on describing the potential for undesirable consequences to follow
from the implementation of some evaluation programs, such as are currently
common, and from some recently suggested by individuals and groups who
are genuinely interested in improving the state of education in the United
States today. My concern is that in the rush to improve the system,
by more thorough evaluation of teacher performance, and by more explicit
consequences made contingent upon successful and unsuccessful performance,
such changes could, unwittingly, make things worse rather than better.
It is not my position that
evaluation and contingent reward and punishment for good and bad performance
cannot, or should not, be implemented. My hope is that where such
programs are established, they are done so wisely. With attention
paid to the potential pitfalls of such systems many of the adverse consequences
can be minimized or avoided. Without such awareness they most certainly
will not. In the following section I outline some of the major points
to which designers of evaluation systems are encouraged to attend.
Implication of the Universal Theory of Performance.
A sensitivity to issues addressed
by the theory can increase one's ability to design an evaluation and compensation
system capable of minimizing problems previously discussed. Let us
begin with the summary statement of the determinants of desirable behavior
change. This Probability of Performance, Pr(P), is a positive
function of Motivation to Perform, Mp, but Motivation only yields
Performance to the extent that Ability to Perform, Ap, is high,
and Constraints inhibiting Performance, Cp, are low. This
set of relationships is formally presented in Appendix 1, as Statement
15: Pr(P) = f[(Mp)X(Ap - Cp)].
Having proposed a new system,
the designer should reason as follows. If we assume, for the moment,
that the numerous factors combining to yield motivation (Mp) have
in fact adequately done so, we must concern ourselves with issues of Ability
to perform the desired behaviors, (Ap), and with whether or not
environmental Constraints, (Cp), are likely to interfere with performance.
Suppose, for example, that a high school mathematics teacher's score on
the Actual Criterion Measure envisioned would be determined, in part, by
assessment of his or her students' ability to write a simple computer program
to solve an equation. No amount of teacher motivation is likely to
be effective here if the teacher lacks sufficient knowledge of computer
programming to teach it to the students. The system designer will
not expect a merit system to increase teacher performance under such circumstances.
What might be proposed is that means be made available for the teacher
to take coursework in computer programming, with sufficient monetary or
release time incentives extended to make this possible.
Returning to our theory,
we will ask ourselves what role environmental Constraint, (Cp),
may be playing in our example. One will quickly realize that even
given Motivation and Ability, that the system will not improve teacher
performance unless there is sufficient access in the classroom to computers
for student use. Lacking such computers, investment of money in a
merit system, or in teacher training, would be totally ineffective.
Clearly, then, if there is money available it should be spent on the purchase
of the necessary computers. Many other possible Constraints may need
to be overcome, such as inadequate prior training of students, unavailability
of class time for computer instruction, inflexible curriculum requirements,
and so on. Until these are adequately addressed, a merit system,
the effects of which, if it has any, will be on teacher motivation, is
doomed to failure.
Let us assume adequate Ability,
and the absence of Constraints. Now the system designer must be sensitive
to the determinants of Performance Motivation (Mp), which is determined
by the extent to which Subjective Expected Utility of Effort to Perform
is greater than that of Effort directed to the best alternative Non-Performance.
Formally this is Statement 14: Mp = f(SEUep - SEUenp).
There exist two distinguishable categories of Non-Performance behavior.
One category includes all behavior not directed to increasing scores on
the Actual Criterion Measure. The other includes all behavior directed
to increases in such scores that fail to increase Ultimate Criterion Factors,
i.e., Contamination CSBs, and Deficiency CSBs.
Starting with the Universal
Theory of Performance, Statement 14, we focus our attention first on SEUep,
for which we must go to Statement 9: SEUep = Subjective Expected Utility
of Ep = sum over all values from i=1 to i=n of the product [Pr(P!Ep)]X[SEUp]
minus (cost) of Effort (Uep). This means essentially the perceived
probability that effort will indeed result in the desired performance,
minus the cost of that effort. For the new incentive system to
be effective, the teacher must place high value on the expected outcome
of desired performance, must believe that that outcome will indeed be contingent
upon performance, and must believe that he or she is capable of such performance
through the exertion of sufficient effort. The perception of
constraints or of lack of ability would reduce the motivation to even exert
an effort to improve performance. If this perception is valid, then
as previously discussed, even the exertion of effort will be ineffective
at improving performance.
The contingent outcome of
our proposed merit system must be sufficiently valued by the teacher.
If it is money, as is usually the case, we must ask if a given teacher
places sufficient value on money. One might guess that if money were
the primary motivator for this person that he or she would not have chosen
to become a teacher in the first place. Even granted that money is
valued, the amounts that are involved may be too small to yield a very
high Subjective Expected Utility for Performance Behavior. It may
require hundreds of hours of extra work to achieve a bonus of $200.
Likewise, if the amount is large but the perceived probability of its actually
resulting from the behavior is small, the SEUp will be small, and consequent
motivation to perform will be small. In a school system, these perceptions
of contingencies may reflect trust or confidence in the goodwill or good
judgment of one's administrative superiors. These qualities are unfortunately
not at very high levels in many school systems.
Now, let us assume all of
the following: high ability, low constraints, high value for the
outcome of performance, and high perceived probability that the outcome
will in fact follow performance, and that effort will lead to such performance.
We must now ask about the Subjective Expected Value of Effort directed
at Non-Performance behavior that is incompatible with Performance, SEUenp.
If a teacher perceives that valued outcomes can be more efficiently
achieved by exerting Non-Performance Effort than by exerting Performance
Effort, then our theory predicts a low level of motivation to perform (see
Statement 14).
The role of available alternative
behaviors is critically important according to the theory, for the motivation
to do anything must always be assessed relative to the motivation to do
something else. In the field of education, especially for women
and for members of minority groups, the availability of potentially rewarding
alternatives has increased in recent years. While women of 20 years
ago may have been adequately motivated to teach for low levels of valued
outcomes, it was so because attractive alternative roles were rarely available
to them. The math teacher in our example who has learned computer
programming will very likely find places outside of teaching that will
pay considerably more money, and that will perhaps at the same time offer
higher professional status, better working conditions, and maybe even shorter
hours. Teachers who possess all of the skills and abilities necessary
to be effective teachers will have more such alternatives than will unskilled
and ineffective teachers. Even if the effective teacher stays in
the system, he or she will have increasing opportunities for part-time,
or for summer employment outside of the system, and will be less able to
utilize such times in the service of becoming a more effective teacher.
Small and uncertain monetary incentives based upon meritorious performance
are unlikely to have much effect for such individuals.
Implication from the Theory of Criterion Shaped Behavior.
The designer of a performance
evaluation system is encouraged in developing Actual Criterion Measures
to take into account potentially undesirable consequences, Contamination
CSBs and Deficiency CSBs, as well as potentially desirable Relevant CSBs.
This should be done for any performance evaluation instrument, whether
its "intended" use is for feedback, for assessment, or for shaping of desirable
behaviors. The Universal Theory of Performance addressed the issue
of whether or not an individual is motivated to engage in behaviors directed
toward increasing scores on the Actual Criterion Measure. If we
assume that the basic requirements for such motivation have been met, then
we should address the issue of what specific changes in behavior are likely
to result from motivation to increase one's score. CSB Theory
describes three categories of such behavior, desirable Relevant
CSBs, and undesirable Contamination and Deficiency CSBs.
A designer of a new evaluation and compensation system wants to increase
only the desirable behaviors, and should be sensitive to system characteristics
that will promote or interfere with that objective.
Validity of the Actual Criterion
Measure is the primary consideration for implementation of a new evaluation
system. A perfectly valid measure would assess all factors of the
Ultimate Criterion, and no other factors. Invalidity relates to problems
of Deficiency and of Contamination. We will address these one at
a time.
Criterion Deficiency. To anticipate problems resulting from Criterion Deficiency, one must ask about the extent to which desirable performance factors are excluded from the Actual Criterion Measure, and evaluate the likelihood and importance of potential decreases in such factors. The theory states that such decreases will occur in unmeasured desirable behavior where such behavior is incompatible with the performance of Relevant and of Contamination CSBs, both of which might be expected to increase. The designer should be encouraged to develop a list of factors in the Ultimate Criterion, and a list of factors in the Actual Criterion Measure. A list of those factors in the Ultimate Criterion that are not in the Actual Criterion can then be deduced. One should expect reduction in these factors to the extent that incompatible Relevant and Contamination CSBs are expected to increase.
If important desirable factors are expected to decrease given such an analysis, the designer must consider the following options:
1. Accept such reduction
in desirable factors on the grounds that anticipated increases in other
desirable behaviors
(Relevant CSBs) are more important, and are worth the expected loss.
2. Devise a new or
revised Actual Criterion Measure that assesses more of the Ultimate Criterion
factors, and is
thus less subject to Deficiency CSBs.
3. Decide to not implement
the new system on the grounds that it may produce too many undesirable
effects to be
offset by anticipated positive ones.
These are not easy choices
to make, for many of the necessary pieces of information will only be knowable
through subjective estimation. One must nevertheless make the choice
from the alternatives listed. To ignore problems of Criterion Deficiency
leads one to the first option by default. To overestimate the undesirabe
effects leads one to option three, not implementing the system at all.
Option two is most attractive where a better measure can in fact be developed
at low cost and in a reasonable period of time. Unfortunately, time
and money will often be insurmountable barriers to improvement in existing
measures.
A further problem is that
changes in the measure that reduce Deficiency CSBs may unintentionally
increase Contamination CSBs. Where broad subjective ratings replace
narrow objective measures for purposes of reducing Criterion Deficiency,
it can be expected that problems of Criterion Contamination will increase.
Criterion Contamination. Any Actual Criterion Measure, the score of which can be influenced by factors not part of the Ultimate Criterion, is potentially susceptible to Contamination CSBs. These range from cheating to brown-nosing, and in addition to reducing Ultimate Criterion Factors, will frequently bring with them additional undesirable effects, many of which have been previously discussed.
A designer of a new system is encouraged to straightforwardly address these possible effects, and to then choose from the following alternatives:
1. Proceed to implement the
system in spite of such Contamination effects on the grounds that they
are not so
negative as to outweigh anticipated positive effects, i.e., increases in
Relevant CSBs.
2. Develop a new or revised
Actual Criterion Measure that is less susceptible to Contamination effects.
3. Decide not to implement
the system on the grounds that it may produce more undesirable than desirable
effects.
As with the options listed
for problems of Criterion Deficiency, choices related to Criterion Contamination
will very likely be difficult ones. To ignore Contamination problems
is to choose option 1 by default. To overestimate their magnitude
is to unnecessarily decide to not implement the new system. To choose
option 2 requires time and money to develop a new or revised Actual Criterion
Measure less subject to problems of Contamination. It may be difficult,
or even impossible given one's resources, to reduce problems of Criterion
Contamination without simultaneously increasing problems of Criterion Deficiency.
This tradeoff is the same as previously discussed between broad subjective
measures, which are likely to be low in Criterion Deficiency but high in
Criterion Contamination; and narrow objective measures, that are likely
to be low in Criterion Contamination, but high in Criterion Deficiency.
Deciding to improve the
Actual Criterion Measure to reduce both Contamination and Deficiency problems
at the same time is the ideal, but the ideal may be impossible to achieve.
The goal is to develop a measure that is both broad enough to cover most
factors of the Ultimate Criterion, yet is objective enough to avoid most
of the problems that can result from subjective ratings. This will
be most feasible for evaluation of performances where the factors in the
Ultimate Criterion are relatively few, and are relatively concrete, as
may be the case for performance of used car salesmen. The task becomes
more difficult as the desired performances become more complex and more
abstract, as is likely to be true in the evaluation of students, teachers,
administrators, doctors, artists, and many others. With hard work,
creativity, and sensitivity to the issues raised by the Universal Theory
of Performance and the Theory of Criterion Shaped Behavior, it may become
possible to evaluate even these performances in such a manner that the
overall behavioral consequences of the appraisal system are more beneficial
than they are harmful to organizational objectives.