Parkin's Lot: August 2005

Thursday, August 25, 2005

Evaluation professionals are undervalued

The evaluation of training is too important to be left to trainers.

Unless certification is involved, at the individual intervention level, at the strategic enterprise level, and at all points in between, the quality assurance processes applied to formal learning initiatives in most organizations are often rudimentary at best.

That does not mean that the quality of training is poor, just that we have no real data to support our feeling that we are doing a good job. Training departments are usually stretched thin, and don’t have the time or resources to do a “proper” quality assurance job at either the course level or at the aggregate departmental level.

Informal evaluation is done by the trainer, most of who know whether an activity is “going well” or not. Formal feedback is carried out at course end, and emphasizes learner reactions, likes and dislikes. And, if the activity involves testing, there are always the scores to look at.

These are all important elements in assessing the quality of a learning experience, and they provide valuable feedback to a trainer. But they are not enough, not by a long shot.

When left in the hands of trainers and instructional designers, the focus of evaluation is too micro, too inward looking. The purpose of training is to improve organizational performance through improving the performance of individuals and teams. Learning evaluation should serve that purpose, as a quality assurance tool. To do that, evaluation has to be pan-curricular, and must adopt a higher level perspective.

This “helicopter view” is hard to achieve if the responsibility for designing and implementing evaluation is too course-specific. Yet who has the time, or the mandate, to step back from a busy course development or training schedule and get strategic?

Only the largest firms have dedicated evaluation resources who know what they are doing and have the credibility to influence policy. And even those resources are becoming imperiled by the inroads that the LMS is making.

Does it matter? There are several reasons why it does.

Implementing a regimen that elevates the strategic importance of evaluation (across all levels) and places it on a more professional level will do two vital things. It will improve significantly the effectiveness and efficiency of all learning activities; and it will save a tremendous amount of unnecessary, un-useful, or redundant work.

My fear is that with the advent of LMS-based evaluation and record-keeping, the information we have about the quality of our learning activities is becoming more narrowly focused, and its usefulness is becoming further diluted. Just as LMS functionality tends to constrain the nature of our design of instruction, it constrains the nature of our inquiry into its impact.

I’d like to see more training departments creating evaluation units and staffing them with a trained expert or two who can help get past the simplistic "smile-sheet & ROI" approach and start building systems that put the important issues on the dashboards of individual trainers, instructional designers, and senior learning managers.

Some LMS tools claim to be able to do just that. But, as with all tools, without a trained and committed hand to guide them, they simply don’t get used.

Just as most of us never use more than 5% of the potential of our spreadsheet software, the potential of emerging tools does not get realized. Automation was supposed to help us do things better; in reality, it often makes us complacent. We dumb down our expectations, dumb down our evaluations, and ultimately dumb down the business impact of our training endeavors.

The “hot career” of the past five years was Instructional Systems Design. I’d like to see companies valuing Learning Evaluation professionals as highly. They can contribute substantially to the quality of training, and to the business results that it achieves.

Original in TrainingZONE Parkin Space column of 19 August 2005

Thursday, August 11, 2005

Revisiting Kirkpatrick's Level One

Whenever I am involved in an evaluation project, I advocate getting rid of the smile sheet completely, and replacing that tortured questionnaire with one closed question, plus an open follow-up to encourage respondents to reveal what really matters to them: “Would you recommend this course to a friend or colleague? Why or why not?”

The response tells you unambiguously about the level of satisfaction of the learner, and any clarification offered tells you about the issues that really matter to that learner. That’s more than is called for at Level 1, especially if you have done a good job of testing your training intervention before rolling it out live.

It’s not always possible to reduce things to one question, but I see it as a starting point in the negotiation. I tend to be somewhat dismissive of Level 1 evaluations. That is not because they serve no purpose (they are vital), but because they attract way too much attention at the expense of business impact studies, and because they are often poorly designed and inaccurately interpreted.

Every training intervention needs some kind of feedback loop, to make sure that – within the context of the learning objectives – it is relevant, appropriately designed, and competently executed.
At Level 1 the intention is not to measure if, or to what extent, learning took place (that’s Level 2); nor is it intended to examine the learner’s ability to transfer the skills or knowledge from the classroom to the workplace (Level 3); nor does it attempt to judge the ultimate impact of the learning on the business (Level 4). Level 1 of Kirkpatrick’s now somewhat dated “four levels” is intended simply to gauge learner satisfaction.

Typically, we measure Level 1 with a smile sheet, a dozen Lickert-scaled questions about various aspects of the experience. At the end of the list we’ll put a catch-all question, inviting any other comments. I won’t repeat the reasons why the end-of-course environment in which such questions are answered is not conducive to clear, reasoned responses. But the very design of such questionnaires is ‘leading’ and produces data of questionable validity, even in a calm and unhurried environment.

Far too many of the smile sheets that I see put words or ideas into the mouths of learners. We prompt for feedback on the instructor's style, on the facilities and food, on the clarity of slides. The net effect is to suggest to respondents (and to those interpreting the responses) that these things are all equally important, and that nothing outside of the things asked about has much relevance. By not prompting respondents you are likely to get to those things that, for them, are the real burning issues. Open questions are not as simple to tabulate, but they give you an awful lot to chew on.

Now the one-question approach does not necessarily give you all the data that you need to continuously fine-tune your training experience – but neither does the typical smile sheet. Trainers need to understand that sound analytical evaluations often require multi-stage studies. Your end-of-course feedback may indicate a problem area, but will not tell you specifically what the problem is. A follow-on survey, by questionnaire, by informal conversation, or by my preferred means of a brief focus group, will tell you a great deal more than you could possibly find out under end-of-course conditions.

The typical smile sheet is a lazy and ineffective approach to evaluating learner satisfaction. It may give you a warm and comfortable feeling about your course or your performance as a trainer, or it may raise a few alarm flags. But the data that it produces is not always actionable, is rarely valid, and often misses the important issues.

In market research, or any statistical field for that matter, there are two important errors that good research tries to mitigate. Known as Type One and Type Two Errors, they measure the likelihood of seeing something that is not there and the likelihood of missing something important that is there. I have never heard anyone address these error types in their interpretation of Level 1 results.

We see in our smile-sheet results what we want to see, and react to those things that we regard as relevant. If we are so smug in our knowledge that we know what is going on anyway, why do we bother with token smile sheets at all?

Original in TrainingZONE Parkin Space column of 5 August 2005

Tuesday, August 02, 2005

Meaningful metrics beyond ROI

There is a common misconception in business that, because they work with them all the time, financial people understand numbers. They like to reduce everything to money – what did it cost or what did it make? They insist on dealing in certainties and absolutes, where every column balances to the penny. But the real world does not work like that. The real world is characterized by imperfections, probabilities, and approximations. It runs on inference, deduction, and implication, not on absolute irrefutable hard-wiring. Yet we are constantly asked to measure and report on this fuzzy multi-dimensional world we live in as if it were a cartoon or comic book, reducing all of its complexity and ambiguity to hard financial “data.”

We struggle for hours (often for days or weeks) to come up with the recipe for “learning ROI.” The formula itself is simple, but the machinations by which we adjust and tweak the data that go into that formula are anything but simple. Putting a monetary value on training’s impact on business is fraught with estimation, negotiation, and assumption – and putting a monetary value on the cost of learning is often even less precise.

Yet when was the last time you saw an ROI figure presented as anything other than an unqualified absolute? If you tried for statistical accuracy and said something like, “this project will produce 90% of the desired ROI, 95% of the time with a 4% error margin,” you’d be thrown out of the boardroom. You simply can’t use real statistics on an accountant, because the average bean-counter can’t tell a Kolmogorov-Smirnov from an Absolut-on-the-rocks. Don’t tell us the truth; just give us numbers that conform to our unrealistic way of measuring the business.

We spend way too much time trying to placate financial people by contorting our world to fit their frame of reference, and we allow them to judge and often condemn our endeavors according to criteria that are irrelevant or inappropriate. Perhaps there is some comfort in knowing that the problem is not unique to training. In a couple of decades in marketing, I have seen plenty of good brands ruined by ill-conceived financial policies, usually to the long-term detriment of the company as a whole.

But you don’t need to be a statistician or an accountant to make a strong business case based on logic and deduction, and there is no need to be pressured into using the preferred descriptive framework of a book-keeper. The pursuit of the measurement of ROI in training is often a red herring that distracts from the qualitative impacts that our work has on the performance of the business. ROI is typically not the best measure of that, and, after making all of the heroic assumptions and allocations needed to arrive at it, that magic ROI figure may well be a false indicator of impact.

Unfortunately, the indicators that are useful and reasonably accurate are often hard to convert to financial data, so they do not get taken seriously. And, compounding the problem, training managers themselves often ignore these indicators because they are not captured at the course level. Our focus too often is on the quality of courses rather than on the quality of our contribution to the business in total.

We need to widen the focus. While learner satisfaction, test results, and average cost of butts-on-seats are useful metrics, it is only after our learners have returned to work that we can begin to see how effective the learning experience really was. What are some of the indicators that let us know how we are doing? Many of them are produced already, often by the financial people themselves, and tracking them over time gives good insights into where we are doing well and where we might need to pay more attention.

Some of those metrics include:

Training costs per employee.
Enrolment rates and attendance rates.
Delivery modes, plans against actuals.
Percentage of target group that is “compliant”.
Time from eligibility to compliance, or to proficiency.
Percentage of workforce trained in particular skill areas.
Learning time as percentage of job tenure.
Availability, penetration, and usage rates of help systems.
Skill gap analyses tracked over time.
Productivity (such as, for example, number of new clients per 100 pitches).
Attrition rates.

There are many, many more. Metrics such as these let us put on the manager’s dashboard indicators of performance in areas such as operational performance, compliance, efficiency, effectiveness, and workforce proficiency, as well as harder to capture dimensions such as motivation and readiness for change. Training departments need to think “outside the course” and come up with ways to derive the right indicators in a way that is inexpensive and unobtrusive.

One of my frequent recommendations is that training departments learn something about data collection from their marketing colleagues, and set up the ability to run surveys and focus groups, to investigate learner satisfaction, customer attitudes, job impacts, attitudes, and manager perceptions. This skill is often absent in training departments, which is a pity because these methods can produce great insights and save money and time. If you build this capacity into your training organization, getting a read on Levels 3 and 4 can become as much a part of your evaluation regimen as gathering smile sheets. You don’t have to interrogate the universe if you can pick a small sample. And you can produce real data and real trends that go down very well in the boardroom.

Original in TrainingZONE Parkin Space column of 22 July 2005