Note: The following is based upon a portion of an earlier work.
For the complete version of the original publication, see:

Downing, L. L. (1994).  Criterion shaped behaviour: Pitfalls of performance appraisal.
International Journal of Selection and Assessment, 2, 1-21.

PART 2

THE THEORY OF CRITERION SHAPED BEHAVIOR

 Leslie L. Downing
 State University of New York
 College at Oneonta





        The Theory of Criterion Shaped Behavior (CSB) herein developed is based upon well established principles from the field of psychology.  It relies extensively upon the traditional areas of learning and of measurement, and it also utilizes many of the findings in social psychology, as they relate to biases in person perception, and in cognitive psychology, especially concerning issues in information processing.  The intended function of the theory is to make explicit the effects on behaviors, of those being evaluated, that can be anticipated from the imposition of performance evaluation systems.  The Theory of Criterion Shaped Behavior is based upon a broader Universal Theory of Performance, to be developed in the next section of this paper.  Specifically, it predicts behavioral consequences derived from the Universal Theory, that are likely to result from the imposition of evaluation systems having various features and characteristics.

UNIVERSAL THEORY OF PERFORMANCE
        Learning theorists, especially the Skinnerians (e.g., Skinner, 1938, 1953), have thoroughly developed the concept of behavior shaping as related to changing the behavioral criteria upon which contingent reinforcement is based.  The Criterion Shaped Behavior Theory (CSB Theory) developed here simply makes more explicit, than is usually the case, exactly how these phenomena relate to issues involving performance evaluations of people in academic and other organizational settings.  Behavior shaping in the field of learning is the means by which organisms, whether laboratory rats or school teachers, increase their frequency of engaging in whatever behaviors are systematically followed by reinforcement.  The behavior expected to increase is not a sequence of motor responses, but is rather a class of behaviors, called operants, the occurrence of which is systematically followed by contingent reinforcement.  For a rat in an operant conditioning box, for example, the operant response class may include all behaviors that have the effect of depressing the lever sufficiently to trigger the switch that releases the food pellet reinforcer into the food cup.
        In other terms, the operant class is defined by a set of performance criteria, which if satisfied result in reinforcing consequences.  The Skinnerians have frequently demonstrated that changes in these criteria result in changes in behavior (Skinner, 1953).  Specifically, behaviors are increased when they satisfy the criterion, or set of criteria, upon which reinforcement is based.  Changes in the criterion result in shaping of, or gradual acquisition of, whatever behaviors satisfy the new criterion, assuming that the contingent consequences are in fact reinforcing.
        Where people's performances are being evaluated, and where subsequent administration of valued consequences is made contingent upon some criterion of that performance, it is expected that the class of behaviors, operants, that result in such consequences will increase in strength, frequency, or probability of occurrence.  Few will argue with this basic prediction.  In fact the entire rationale for improving performances of teachers, etc., by making them accountable, and by differentially rewarding individuals based upon performance scores, is dependent upon the validity of such a prediction.
        When psychologists disagree with each other, it is not usually on the validity of the basic predictions, but is on which theoretical system best explains and describes the mediating effects for the performances of people in varied situations.  Few contemporary psychologists accept the Skinnerian view as applied to such explanation, which considers only directly observable variables and negates the relevance of internal or hypothetical constructs, such as expectancy and cognition.  Cognitive/expectancy theory (e.g., Tolman, 1932, 1948), and social learning theory (e.g., Rotter, 1972; and Bandura, 1977a) have led the way to most current thinking about the mechanisms responsible for mediating the relationship between responses, contingent reinforcement, and changing or shaping of criterion relevant to performance.  In this view, prior reinforcement has an effect because it alters the expectancy that a response, or class or responses, will be followed by contingent-valued outcomes.  It is the change in expectancy that produces a change in behavior as the individual attempts to maximize some hypothetical outcome (O), expected value (EV), or subjective expected utility (SEU).  An SEU for a given response is the subjectively experienced expectation that that response will be followed by a contingent outcome, multiplied by the subjective value attached to that outcome.  Where more than one outcome of a response is possible, the SEU will be the sum of the separate SEUs associated with those outcomes.  The SEU associated with a given performance (P), will be referred to as SEUp.
        While most performance theories limit themselves to describing influences surrounding a single performance, for our purposes it is necessary to describe, also, the role of other available response alternatives.  Each non-performance (NP), will also have a subjective expected utility associated with it (SEUnp).  Unless otherwise specified, the SEUnp is to be understood as the highest SEU available for a non-performance alternative response that is incompatible with the occurrence of P.
 Motivation to perform (Mp), or to not perform (Mnp), will be based upon these SEUs, but will be further conditioned by the expectation that Effort to Perform (Ep), or to not perform (Enp), will in fact produce the performance in question.  Bandura's (1977b) concept of "self-efficacy" likewise concerns subjective beliefs about one's ability to convert effort into performance.  Even this Motivation to Perform does not, however, directly translate into performance, for many factors influence whether that effort will be successful.  Chief among these is Ability, but others, including environmental factors, may impose additional Constraints.
        Many versions of expectancy theory have been developed, using some or all of the concepts described above.  The version developed here is largely an elaboration and combination of several of these (cf. Tolman, 1948; Vroom, 1964; Porter and Lawler, 1968; Rotter, 1972; and Bandura, 1977a, 1977b), with the role of alternative available responses owing a debt to the Thibaut and Kelley (1959) social exchange theory concept of Comparison Level for Alternatives.  The result is what I will call a Universal Theory of Performance.  The theory is based upon a set of variables, and relationships between variables, as delineated in Appendix 1.  While complete understanding of the Universal Theory of Performance may be dependent upon examination of these Appendix 1 definitions and theoretical statements, it is hoped that the general reader will be capable of understanding the implications of the theory for Criterion Shaped Behavior from the following discussion.
        Basically, the theory states the Motivation to perform a behavior is a direct result of one's subjectively held expectation that trying to perform the behavior will likely result in that behavior, which will itself be followed, with high probability, by consequences that are highly valued.  This "subjective expected utility" of effort to perform, relative to a comparable "subjective expected utility" of effort to do something else, determines Motivation to perform. Actual Performance, however, only follows from Motivation, to the extent that one possesses an Ability to perform, and to the extent one is not subject to environmental Constraints that prevent or inhibit performance.
        The Universal Theory of Performance, as applied to situations in which individuals are evaluated with criterion measures of their performance, and compensated in relation to the scores that they obtain, can be roughly stated as follows:

            Individuals will be MOTIVATED to increase the rate of those behaviors that they believe will most reliably
            and with least effort increase their scores on Actual Criterion Measures of performance; to the extent that
            highly valued outcomes are expected to be contingent upon such increases; and to the extent that other,
            incompatible behaviors, are not perceived as being more efficient at achieving desirable outcomes.
            Actual PERFORMANCE will be positively related to such motivation to the extent that ability to perform is
            high, and constraints preventing performance are low.

            Criterion Shaped Behavior (CSB), is defined as any change in behavior that results from efforts to
            increase one's score on an Actual Criterion Measure of Performance.  Probability of performance of
            such CSBs will be a function of the Motivational, Ability, and Constraint variables of the Universal
            Theory of Performance.

        Whether the behaviors intended to increase scores will be desirable or undesirable is dependent upon the specific characteristics of the performance measure.  The system we use for clarifying the important characteristics of performance evaluation systems is elaborated in the following section.

MEASUREMENT THEORY AND CRITERION SHAPED BEHAVIOR

        The CSB Theory utilizes many concepts from traditional measurement theories.  The concept of the Ultimate Criterion (Thorndike, 1949) involves both a perfect, hypothetical, measure of "ideal" performance, and the set of all factors (i.e., behaviors and characteristics) assessed by such a measure.  In our terms these are respectively the Ultimate Criterion Measure and the Ultimate Criterion Factors.
 The Actual Criterion Measure is the specific instrument used to assess performance scores of those being evaluated.  Actual Criterion Factors are all of the behaviors and characteristics of those being evaluated that can influence scores on the Actual Criterion Measure.

        Figure 1 represents these Ultimate and Actual Criterion Measures as two overlapping circles, and identifies the three resulting sectors as Criterion Contamination, Criterion Relevance, and Criterion Deficiency.



Figure 1: Overlap of the Ultimate and the Actual Criterion Measures

Criterion Relevant Measures and Factors are those Actual Criterion Measures and Factors that overlap Ultimate Criterion Measures and Factors.  Criterion Relevance is, essentially, the validity of the Actual Criterion Measure.

Criterion Contamination Measures and Factors refer to that portion of the Actual Criterion Measures and Factors that does not overlap with Ultimate Criterion Measures and Factors.  Criterion Contamination one source of invalidity of the Actual Criterion Measure.

Criterion Deficiency Measures and Factors refer to that portion of the Ultimate Criterion Measures and Factors that does not overlap with Actual Criterion Measures and Factors.  Criterion Deficiency is one source of invalidity of the Actual Criterion Measure.

        For purposes of assessment validity, some theorists argue that failure of a measure, the Actual Criterion, to fully capture the Ultimate Criterion, i.e. Criterion Deficiency, may not be very important (Landy & Trumbo, 1978, p. 138).  If a measure is assumed to be non-reactive, then it is true that measuring only a part of the Ultimate Criterion is sufficient, as long as that part is highly correlated with the Ultimate Criterion as a whole.  Once a measure becomes reactive, however, this approach is doomed to failure, for not only will its use lead to undesirable Deficiency CSBs, to be described below, but as a result of this effect the correlation of Actual Criterion Measures scores with The Ultimate Criterion will steadily decrease.  Koretz (1989) has convincingly argued, without the aid of a systematic theory of Criterion Shaped Behavior, that the previously mentioned NAEP assessment test will be invalidated by attempts to broaden its use by rewarding states or school systems whose students achieve the highest scores (U.S. Department of Education, 1987).  The inherent Criterion Deficiencies in the NAEP test now in use are not much of a problem, for the test is non-reactive, for it is used only for purposes of assessment.  Rewarding high scorers will make it reactive, and problems of deficiency will result in undesirable behaviors (e.g., teaching to the text) which will in time render scores meaningless.
        Criterion Contamination is an acknowledged problem for assessment, for it directly invalidates a measure by allowing factors other than performance to influence scores.  As we will see, both Contamination and Deficiency are major problems for Feedback and Criterion Shaped Behavior functions of performance evaluation.

Criterion Shaped Behavior (CSB)

        Criterion Shaped Behavior refers to any change in behavior that results from Motivation to increase one's score on an Actual Criterion Measure of Performance.  In the Universal Theory of Performance, Motivation to Perform, Mp, and Performance Behavior, Bp, can refer to any behavior one cares to specify.  The behavior of major interest in a performance evaluation situation is behavior that increases one's score on the Actual Criterion Measure.  In fact, this is a set of three types of behaviors, any one of which may result in such increased scores.  Of these three types of CSB, one is desirable, from the point of view of the organization employing the evaluation system, and two are undesirable.

Desirable CSBs

        Relevant CSBs.  Any changes in behavior resulting from efforts to increase scores on the Actual Criterion Measure are desirable (for the evaluator) to the extent that they increase Ultimate Criterion Factors.  The only behaviors that increase both Actual and Ultimate Criterion Factors are those that affect Criterion Relevant Factors.  Using the Universal Theory of Performance, we can predict that:

        To the extent that a Relevant CSB is low in effort and is high in both the expectation of contingent consequences
        and in the subjective value placed upon those consequences, and to the extent that one believes effort to produce
        that behavior will be successful in doing so, the strength of motivation to enact that CSB will be high.

        To the extent that any incompatible non-performance is believed to be a more efficient alternative means of
        achieving valued outcomes, performance motivation will be low.

        Assuming a high level of performance motivation, performance behavior will be high to the extent that ability to
        perform is high and constraints inhibiting performance are low.

        It is exactly this variety of CSBs that advocates of evaluation-based merit awards envision.  What is typically assumed is that valued outcomes made contingent upon some performance will lead to increases in such performance.  In fact, such increases are not expected if effort is too high, if contingent rewards are of insufficient value, if the perceived contingency between effort to perform and performance, or between performance and valued outcomes is too low, if ability to perform is too low, or environmental constraints inhibiting performance are too high.  Nor will performance be increased if alternative non-performance behaviors are perceived to be more efficient at achieving valued outcomes.

        This cumbersome list of limitations is presented here to convey an important point.  Though increases in Criterion Relevant Behavior (Relevant CSBs) may occur following imposition of a performance-based reward system, there will be numerous situations in which such an increase is not to be expected. If it is correctly assumed, for example, that increased teaching effectiveness behaviors are believed to lead to highly valued contingent outcomes (e.g., merit money), then these Relevant CSBs may increase; but not if the teacher perceives that the effort required to increase scores on the actual criterion measure by such relevant behaviors is too high.  This may be the case if a teacher believes that many extra hours would be required, or additional schooling, or learning a new and possibly intimidating technology, e.g., computers.  The incentive would not be expected if the perception of a contingency between the desired performance and the valued outcome were unclear or lacking in credibility, as may occur for a teacher who has little trust in the administration; nor would it be effective for a teacher who has little faith in his or her ability to convert effort into successful performance, as would be expected for teachers lacking sufficient internal locus of control, or self-esteem.  Even a teacher who is motivated to exert the effort for improved performance will fail to demonstrate an increase if necessary abilities, such as intelligence, skills, or previously acquired knowledge or training are lacking, or if environmental constraints prevent effort from being converted into desired improvement in performance.  Such constraints as having inadequate facilities or equipment, ill prepared students, or inadequate administrative support may lead current efforts to fail, which can lay the groundwork for a loss of motivation for even attempting improvement in the future.
        Perhaps most disturbing is the fact that incentives will only be expected to work if the expected increase in valued outcomes for desired improvement in performance is greater than what is perceived to be available from incompatible alternative behaviors.  If tending bar or waiting tables is believed to be a more efficient means of increasing one's income, or if coaching Little League is a more efficient means of achieving a valued sense of accomplishment, or if running for the city council is an easier way to fulfill one's power or status needs, these activities will increase at the expense of improved teaching performance.
        If we assume that all of these problems have been solved, and that increases in Relevant CSBs will indeed occur, it is still important to note that only some of the behaviors that constitute ideal performance have been measured, those in the Criterion Relevance sector of Figure 1.  Those desirable behaviors that have not been measured are in the Criterion Deficiency sector, and none of these will be expected to increase.  Behaviors that have not been assessed by the Actual Criterion Measure, cannot influence scores on that measure and thus valued outcomes cannot be contingent on them. Therefor, the SEUs associated with performance, or with efforts to perform such behaviors, will not be increased by a merit system.  Of course this is not a problem if we have a perfectly valid measure with which all Ultimate Criterion Factors are fully assessed.  The fact is, however, that all Actual Criterion Measures are deficient to some degree, and matters are made worse by the fact that they are frequently deficient with respect to major or even essential factors.
        Assume, for example, a system that is very good at adequately measuring quantities (e.g., How many students graduated?).  Such quantitative factors will very likely get measured, and will fall in the Criterion Relevant sector.  Now assume that it is very bad at measuring qualities (e.g., How educated had they become?).  Such qualities are likely to not get measured at all, and so these will very likely fall in the Criterion Deficient sector.  What results is a merit system that is quite effective at increasing the quantity of graduates, but is less effective at increasing the quality of education.  This point is raised also in discussions of test formats, especially concerning the widespread use of multiple choice tests (Frederiksen, 1984).  Skills of students that are readily measured by such tests do get measured, and as a result the system responds by teaching those skills, leading to increases, or at least prevention of decreases, in such performance over time.  Skills difficult to measure by such tests do not get measured, hence do not get taught, and consequently performance fails to increase, and may even decrease.
        In summary, those desirable behaviors that do get measured (Relevant CSs) may increase as a result of imposing a performance-based reward system, but in order for such an increase to occur many separate conditions, involving effort, ability, constraints, and the perceptions of values and contingencies for both performance and for non-performance behaviors must be satisfied.  Even then, no increase should be expected for those desirable behaviors that are not measured, those in the Criterion Deficiency section of Figure 1.  While failure of the system to increase desirable behaviors is of major importance, a potentially more important consideration is the possibility that performance evaluation will produce an increase in undesirable behaviors.

Undesirable CSBs

        Two separate classes of Criterion Shaped Behavior can be defined in relationship to the two non-overlapping segments of Figure 1, both of which are undesirable.

        Deficiency CSBs.  It was shown above, as a limitation on desirable increases in Criterion Relevant Factors, that important or essential factors may fall in the Criterion Deficiency category and consequently not be subject to such increases.  What often occurs, however, is that desirable behaviors that fall in the Criterion Deficient sector not only fail to increase, but will actually decrease.  The major reason for this decrease is that other behaviors, those that do increase scores on the Actual Criterion Measure, will increase, leaving less time or energy available for behaviors which fail to increase these scores.  In terms of the Universal Theory of Performance, any performance will decrease to the extent that some alternate non-performance is perceived to be more efficient at achieving valued outcomes.  Thus any anticipated increase in Relevant CSBs, or in the yet to be described Contamination CSBs, will be expected to result in a decrease in those desirable behaviors falling in the Deficiency sector.  For example, if the quantity of performance is assessed (Criterion Relevant Factor), but its quality is not (Criterion Deficient Factor), a likely result is a decrease in quality.  Thus Deficiency CSBs are those which decrease desirable behaviors in the Criterion Deficiency category.

        This is what is occurring in the examples given by Frederiksen (1984) concerning the decrease in teaching efforts in areas not measured by standardized tests of student achievement.  If teachers' merit increases are given for increases in their students' test scores, we expect a deficiency-shaped decrease in teacher efforts, however worthy, that fail to be reflected in those scores.  In fact, the greater the incentive associated with teacher performances that do get measured, the greater will be the decrease in those performance that are not measured.  Critics of a state-wide performance appraisal system in the Texas schools, without the aid of a systematic theory of why or how undesirable effects might be expected, have argued that good teachers have been prevented from engaging in effective teaching by the need to enhance students' scores on state-wide standardized tests. It was contended that "teaching to the tests" resulted in a decline in many aspects of learning that were not measured by these tests. Whether or not such fears are likely to be valid would depend upon the extent to which the tests used were Criterion Deficient.

Contamination CSBs.

        Motivated increases in behaviors which increase Actual Criterion Measure scores by increasing Criterion Contamination Factors, called Contamination CSBs, can have multiple undesirable effects.  Contamination CSB's increase scores on the Actual Criterion Measure without increasing the types of behaviors that increase Relevant Factors.  These are undesirable for two different types of reasons.
        First of all, because they take time away from behaviors that would produce desirable effects in both the Criterion Relevant and the Criterion Deficient categories.  One example of Contamination is rater bias.  To the extent that the Actual Criterion Measure is influenced by a bias of an evaluator or rater, the validity of the performance measure is reduced.  In this case, invalidity results from scores reflecting not only desired performance behaviors, but also how much the rater likes or is otherwise biased for or against the person being rated. Contamination shaped behavior involves the active manipulation of the score one receives by the performance of behaviors not included in the Ultimate Criterion.  In this case, behaviors that exploit the potential for such bias are likely to increase.  It is the motivated increase in such behaviors that I call Contamination CSBs.  If the Actual Criterion Measure contains a Contaminating rater bias factor, and a Relevant performance quantity factor, and is Deficient in assessing the performance quality factor, then "buttering up the boss" to enhance one's score through Criterion Contaminating rater bias, could lead to a reduction in both quality, and quantity.  While both "buttering up the boss," and increasing performance quantity would increase one's score on the Actual Criterion Measure, since both require time, any time spent on one takes away from time available to spend on the other.  The choice of which response is to be made will depend upon their perceived relative efficiency at achieving valued outcomes.  If "buttering up the boss" is more efficient than increasing or maintaining performance quantity, then an actual reduction in quantity might occur.  Thus, Contamination CSBs can result in decreases in all types of Ultimate Criterion Factors, both those in the Criterion Relevance category and those in the Criterion Deficiency category.
        Secondly, Contamination CSBs may have other undesirable features, beyond their effects of reducing desirable behavior.  Cheating on a test, for example, a type of Contamination CSB, may not be incompatible with desired performance, thus may increase one's score on the Actual Criterion Measure without reducing either Criterion Relevant or Criterion Deficiency categories of behavior.  It might be argued that such an effect is not detrimental to performance, but simply fails to facilitate performance.  One might be concerned, however, that cheating has additional negative impact upon the system, by creating distrust, cynicism, and feelings of inequity, especially in those who refrained from cheating.  Our point is that such an appraisal system is not only without merit, for it fails to increase relevant behavior, but that it may have negative side-effects which make the consequences using it worse than of having done no appraisal at all.
        The CSB Theory clearly delineates three categories of behavior change that may result from imposition of an appraisal system.  Of increases in Criterion Contamination Factors, Contamination CSBs, and Criterion Relevant Factors, Relevant CSBs, and decreases in Criterion Deficiency Factors, Deficiency CSBs, only increases in Relevant CSBs are desirable.  Ultimate Criterion Factors that are not measured are expected to decrease, and Actual Criterion Factors that don't reflect Ultimate Criterion Factors are expected to increase.  We can even expect a decrease in those Ultimate Criterion Factors that are assessed (i.e., Criterion Relevant Factors) to the extent that increasing scores through Contamination CSBs is perceived to be more efficient.
         The CSB Theory can be used to help design evaluation systems that maximize the occurrence of desirable and minimize the occurrence of undesirable behavior change.  It can sensitize evaluators to the potential for negative as well as positive consequences of imposing evaluation systems, and it can possibly promote a more enlightened debate about when, where, and in what way, we should impose evaluation systems upon individuals or institutions in educational as well as in other organizations.
        Others have at times addressed some of the issues developed here (c.f. Popham, 1983; Frederiksen, 1984; Koretz, 1989) as they relate to systems of evaluation in the field of education.  Some of what CSB Theory predicts has been pointed out less systematically in Kerr's (1975) discussion on, "The folly of rewarding A while hoping for B."  None of these, however, employs a theory, model, or framework with respect to which the numerous pitfalls of evaluation systems can be anticipated, or with the help of which better systems might be devised.  Hopefully CSB Theory as developed here will provide the necessary guidance for designing more adequate systems for performance evaluation for the future.

REFERENCES
LINK TO PART 3
RETURN TO MAIN PAGE