Return to Training Resource Center

National Weather Service Training Center
Hydrometeorology & Management Division

A Framework for Evaluation

December 1996


1. Introduction

Before getting into specific evaluation methods, it is useful to look at training evaluation from a broader perspective. This perspective can provide a framework upon which detailed evalua-tion methods can be built. The four level approach of Kirkpatrick (1994) provides an excellent model for accomplishing this view. The purpose of this chapter is to describe the Kirkpatrick model, to discuss its implications, and to show how it can be used in the development of an effective training evaluation program.

2. Four Level of Evaluation

Kirkpatrick's model provides a conceptual framework to assist in determining what data should be collected for evaluation pur-poses. This data collection and evaluation process must be planned as part of the design and development segment of lesson preparation. Otherwise, it is possible to miss an opportunity to collect data which is needed for the evaluation process.

Kirkpatrick's model contains four levels of evaluation with four corresponding questions (from Phillips, 1991):

a. Reaction Level: Were the participants pleased with the program?

b. Learning Level: What did the participants learn in the program?

c. Behavior Level: Did the participants change their behavior based on what was learned?

d. Results Level: Did the change in behavior positively affect the organization?

In the next four sections each of these levels will be explored in depth.

3. Reaction Level

At the reaction level the basic question that begs an answer is: Were the participants pleased with the program? How did they feel about such things as lesson or course material, the instructors, the facilities used for the class, the methodology, etc. Think of reaction level evaluation as measuring "customer satisfaction". A positive reaction to a lesson does not ensure that a person has learned anything, but a negative reaction to a lesson almost certainly reduces the chance for learning.

Kirkpatrick cites four reasons why measuring reaction is important. First, it provides valuable feedback on a lesson. Secondly, it shows that the trainers are there to help the trainee do their job better and that they need feedback to make this happen. Third, it provides quantitative information about the training for management review. Finally, it provides quantitative information that can be used to establish standards for later classes.

Kirkpatrick also lists eight guidelines for evaluating at the reaction level.

a. Get a reaction to both the lesson subject and the lesson instructor. Keep questions dealing with these two ingredients separate. Decide what specific aspects of the lesson trainees should comment on and build questions on these items into the evaluation instrument.

b. The form that is used for the evaluation should quantify student reaction. When these types of questions are designed, it is better to use a range of quantitative responds rather than a simple yes-no answer. For example, if you ask "How worthwhile was the course for you?", provide a range of responses like: very worthwhile, worthwhile, not very worthwhile, and a waste of time.

c. On the other hand, also encourage written comments and suggestions. Allow the student "open space" to comment on certain aspects of the program and on the program in general. Use open ended questions to accomplish this goal. These questions are more difficult to summarize in a statistical sense but provide an opportunity for students to say what they feel. The reaction level instrument should be a blend of subjective and objective questions.

d. It is important to get a 100 percent immediate response on reaction level instruments. This means that there must be enough time on the final day of a course for students to fill out the questionnaire or answer interview questions prior to leaving. Plan this time into the lesson or course. If a student takes an end-of-course critique home with him/her, the result is a set of responses that are not an "immediate reaction" to the training.

e. It is important to get an honest responses to questions. Names on questionnaires may have to be optional to get honest answers to questions.

f. If enough reaction level evaluations are available, standards can be built from them. If this is done, ensure that these standards establish a minimum level of acceptable reaction and do not represent levels that can be attained by only a few instructors or lessons.

g. Once standards are established, individual training events can be measured against the standard using a reaction level instrument.

h. The evaluation forms should be reviewed by both the instructors involved in a training event and management supporting the training. This process will help trainers improve their presentation and provide management with information on how well the training was conducted. However, don't get caught in chasing individual comments. Over a series of training events, comments will show a spectrum of responses. Any conclusions derived from the evaluation forms should be based on statistics from a series of courses.

In the WFO/RFC environment, a reaction level evaluation can be accomplished in two ways: an end-of-training questionnaire, or an end-of-training interview. In each case, keep the list of questions short and relevant to the training. Setting standards, as described in items f and g above, do not apply in a WFO/RFC training environment. SOOs and DOHs should be concerned about quality training but there is no need to set formal standards within a WFO/RFC.

4. Learning Level

The second level of Kirkpatrick's model deals with learning. Learning is "the extent to which participants change attitudes, improve knowledge, and/or increase skill" as a result of attending a training program. In order to effectively evaluate learning, the training must have a specific objective against which evaluation can be done. Measuring learning is more difficult and more time-consuming than measuring reaction.

Learning measures should be objective and quantifiable. Things such as paper-and-pencil tests, performance or skill practices, and simulations can be used to measure learning. Kirkpatrick suggests four guidelines for evaluation of learning.

a. Use two groups, a control group and an experimental group. The control group does not receive any training while the experimental group does. Any difference in learning between these two groups can be explained by learning that took place because of the training. This approach may not be practical with small organizations. For example, it will not work at a typical WFO/RFC.

b. Evaluate changes in the knowledge level, attitudes, or skills of the trainee. Changes can be measured by using pre- and post- tests. Knowledge and attitudes can be evaluated by "paper-and-pencil" style tests while skill measurement requires some type of "performance" test. In a WFO/RFC environment, the pre-test may be as simple as critical observation of the trainee prior to the training or the use of a more formal approach such as an explicit pre-test.

c. It is best to evaluate everyone involved in a training program. In some cases, particularly with a large group of trainees, it may more practical to select a sample group and focus the learning evaluation on them. Considering the relatively small number of trainees in a typical WFO/RFC, 100 percent evaluation of learning should be the goal.

d. If the trainee has not learned anything, who is to blame? The answer may be a surprise: It may be you, the instructor. The effectiveness of an instructor is measured by how well students learn. If trainees are not learning the lesson material, ask: What can I change or do better to improve learning?

Evaluation of learning is important and must be part of all training. Explicit evaluation measures will be covered in a later chapter. Also, even though a person learns something, it does not mean that he/she will apply that learning to their job or affect the overall operations of the organization. These factors are considered in the next two sections.

5. Behavior Level

The question posed at this level checks on how the training affects job performance. Did the participants change their behavior based on what was learned? Has the job performance of the trainee improved because of the training? Just as a favorable reaction to a training program does not mean that any learning has occurred, learning lesson material does not guarantee that the learning will be applied on the job.

Kirkpatrick cites four conditions necessary for change to occur:

a. "The person must have a desire to change."
b. "The person must know what to do and how to do it."
c. "The person must work in the right climate."
d. "The person must be rewarded for changing."

A training program can influence the first item and provide the knowledge, skills, and attitudes needed for the second. However, the third and fourth items must be provided at the local level. Kirkpatrick cites five kinds of climate in a typical office:

a. A preventing climate: In this climate, local management forbids the trainee to use or apply any of the training material to the job. This type of climate raises a serious question: Why did the trainee attend the training in the first place?

b. A discouraging climate: In this climate, local management does not forbid application of the training to the job, but makes it obvious that changes in the way things are done is not desired.

c. A neutral climate: In this climate, training is ignored.

d. An encouraging climate: In this climate, local management encouraged the staff to learn and apply that learning to the job.

e. A requiring climate: In this climate, local management knows what the staff has learned at a specific training class and ensures that the learning is applied to the job.

As a SOO or DOH, which type of environment do you foster? Do you send people to training that is never used? Do you ensure that training is applied to the job following a training session?

A SOO or DOH is in an excellent position to evaluate training at the behavior level because he/she can directly observe the trainee in the work environment and see if the training is being used. This is one advantage that a centralized training facility such as the National Weather Service Training Center does not have.
Kirkpatrick lists two things that should be considered at the behavior level of training evaluation. First, give the trainee an opportunity to change their behavior; and second, it is very difficult to predict when a change in behavior will occur. This two factors imply that evaluation at the behavior level cannot be completed immediately after a training session. There must be a delay to allow the change to happen.

Kirkpatrick also provides seven guidelines for evaluating behavior:

a. Use a control group. This option is similar to the control group discussed at the learning level. The difference is the focus of the measurement. At the learning level, differences in knowledge, skills, or attitudes were measured, while at the behavior level, differences in job performance are measured and attributed to the training.

b. As mentioned above, give the trainee time to review the training material and apply it to his/her job. Unless the training is provided to implement a dramatic change in the way things are done, the trainee will likely experiment with any changes prior to full implementation of those changes.

c. Evaluate job performance both before and after the training. Through this comparison any change can be observed and the change attributed to training.

d. Who should be interviewed or surveyed in order to determine behavior changes? Who is in the best position to see any change that occurs? It may be the trainee, his/her immediate supervisor or subordinates; or it may be the trainee's peers. As a decision is made about who to select for the evaluation, four questions should be considered:
Who is best qualified to observe the change?
Who is most reliable in providing honest responses?
Who is most available to answer questions or complete a survey?
Are there any reasons why one of the four people mentioned about should not be used?

e. Do you measure behavior changes in all trainees or just focus on a sample? The answer to this question may depend upon the size of the group who are trained. In general, the more people evaluated, the better. In a WFO/RFC environment, it should be relatively easy to assess behavior change for all trainees.

f. It may be necessary to repeat a behavior level evaluation several times in order to see the full impact of the change. For example, a series of evaluation surveys at three, six, and nine months after the training might be appropriate in some situa-tions. It is good to remember, however, that factors, other than the training, can have an impact on job performance if the evaluation is conducted too long after the training itself.

g. It may be important to consider the cost of the evaluation. In most instances, the cost of evaluation at the behavior level is primarily staff time. If "time equals money" is important, this factor must be considered.

Evaluation at this level is more complex than at the learning or reaction level. Nevertheless, it is important and should be built into any training program. Surveys and interviews work well at this level, as well as direct observation, something fairly easily accomplished at a WFO/RFC. In some cases statistical comparisons of an individual's performance both before and after training may be order.

6. Results Level

A results level evaluation focuses on the impact of training on an organization. How did the training save costs, improve work output, implement quality changes, reduce turnover, or improve interpersonal communications, for the organization as a whole? This level does not look at any one individual but the aggregate effect of all individuals in the organization. The final or ultimate objective of any training program should be centered at this level.

Evaluation at this level, is very important, but also very difficult to accomplish. Changes in any organization occur with or without training. Isolating the effects of training is not always easy to do. Trainers must usually be satisfied with "evidence of positive change" due to training rather than "proof of positive change".

Kirkpatrick offers several guidelines for evaluation at the results level. In name these are the same as those discussed at the behavior level: use a control group, if practical; allow time for the training to be applied; measure both before and after the training; repeat the evaluation more than once; and consider both cost and benefit of the evaluation process. The difference at this level is the focus on the organizational as a whole.

At a WFO/RFC this evaluation level would apply to the impact of training on the office as a whole, rather than on individual changes. As was used at the behavior level, statistical comparison might be useful here, as well as comparison of costs, productivity, or quality measures.

It should be obvious by now that to do evaluation at the behavior or results level, there is a need to collect a variety of statistical information both before and after the training. Some of this information may be part of an on-going quality control or verification effort at a WFO/RFC. In other cases, the statistics may have resulted from a needs analysis used to identify office training requirements. In any case, good evaluation requires good data upon which to base the evaluation.

7. Implications and Use at the WFO/RFC Level

The four-level Kirkpatrick model of evaluation provides a framework upon which training evaluation can be built. When a training program or lesson is designed, the designer (the SOO or DOH) must ask: How do I want to evaluate this training? Am I interested in just the reaction of the trainee to the training? Am I interested in determining if my students learn anything? Should I care if the training is used operationally? Or do I want this training to have a positive impact on the entire office? The answer to these questions needs to be expressed as part of the program or lesson objectives. Similarly, there is a need to determine, during the design phase, how these questions will be answered. What things should be measured before, during, and after the training to ensure that the training was effective?

For training programs or lessons conducted at a WFO or RFC, it should be fairly easy to provide evaluation at each level. Some evaluation methods have been discussed as part of the model discussion. Other methods will be covered in subsequent chapters of this manual.


References

Kirkpatrick, Donald L., 1994: Evaluating Training Programs, The Four Levels., Berret-Koehler Publishers, San Francisco, 231 pp.

Phillips, Jack J., 1991: Handbook of Training Evaluation and Measurement Methods, 2nd Edition, Gulf Publishing, Houston, 415 pp. [ See pages 44-45 ]


Review Questions and Exercises


Use the following questions to review the content of this lesson.

(1) Match the evaluation level with the question corresponding
to that level.

_____ Reaction Level A. What did the participants learn from the program?

_____ Learning Level  B. Did a change positively affect the organization?

_____ Behavior Level C. Were the participants pleased with the program?

_____ Results Level   D. Did the participants change the way they did things because of the training?

(2) The Reaction Level can be thought of as measuring:

A. Student performance changes
B. Customer satisfaction
C. Organizational improvements
D. What the trainer learned

(3) Which of the following should be considered when evaluating at the Reaction Level? [more than one answer possible]

_____ Evaluate only the lesson content

_____ Obtain both subjective and objective responses

_____ Get 100 percent response from the trainees

_____ Honesty is important

_____ Only the course instructor should review the Reaction Level results

(4) The Learning Level can be thought of as measuring:

A. Student performance changes
B. Customer satisfaction
C. Organizational improvements
D. What the trainer learned

(5) Learning Level evaluation should measure changes in:

A. Knowledge
B. Skill
C. Attitudes
D. All of the above

(6) The Behavior Level can be thought of as measuring:

A. Student performance changes
B. Customer satisfaction
C. Organizational improvements
D. What the trainer learned

(7) Match the kind of climate with its description:

_____ Preventing   A. Training is ignored.

_____ Discouraging B. Using training is forbidden.

_____ Neutral    C. Using training is required.

_____ Encouraging D. Changes are allowed but not welcome.

_____ Requiring    E. Changes are allowed but not required.


(8) The Results Level can be thought of as measuring:

A. Student performance changes
B. Customer satisfaction
C. Organizational improvements
D. What the trainer learned

Complete the Following Exercises

Consider a lesson you have taught recently. Review that lesson from the perspective of Kirkpatrick's four levels of evaluation. Are there any changes you would make in the evaluation methods used with that lesson?


Answers to the Review Questions


(1) Reaction Level - C
  Learning Level - A
  Behavior Level - D
  Results Level - B

(2) B

(3) _____ Evaluate only the lesson content
__X__ Obtain both subjective and objective responses
__X__ Get 100 percent response from the trainees
__X__ Honesty is important
_____ Only the course instructor should review the Reaction
Level results

(4) D

(5) D

(6) A

(7) Preventing - B
  Discouraging - D
  Neutral - A
  Encouraging - E
  Requiring - C

(8) C


Return to Training Resource Center