Measuring Effectiveness in the Real World

Print More


In a field of research seemingly obsessed with developing, cataloging and replicating evidence-based program models, one study has found a unique path to one of the most basic truths of youth work.

Model, schmodel. To be effective, youth programs must identify and target specific groups and behaviors, and even then, they are only as effective as their youth workers.

You already knew that? Well, the truly groundbreaking gospel of the monograph, “Understanding Prevention Effectiveness in Real-World Settings: The National Cross-Site Evaluation of High Risk Youth Programs (HRY),” is how its authors arrived at that conclusion.

The study introduces an innovative way to link model program evaluation and real-world practice: Use an evaluation process that embraces diversity and bias in programs, staff and participants, rather than tightly controlling for them as possible contaminates of outcome data.

Evaluators J. Fred Springer and Soledad Sambrano rejected choosing either a “gold standard” randomized clinical trial rigidly designed to test a program model, or a meta-analysis designed to compare the outcomes of various programs in order to find the “best.” Instead, they designed a large multi-site evaluation of HRY that observed and documented variations in the design and implementation of the program, which focuses on tobacco, alcohol and drug use. Then they statistically isolated and stripped away those differences until they were left with the “true” effects of each program on its participants.

“This is a very different approach,” said Springer, a research director with California-based EMT Associates. “There was no intent in this study to find the best program.”

Instead, the study showed how to provide “a context [for] understanding the selection, application and adaptation of program models” – something the evidence base for those models can’t do, Springer wrote in the study monograph, published in The American Journal of Drug and Alcohol Abuse (Vol. 31, No. 3, 2005).

In other words, Springer said in an interview, “Our approach recognizes that it’s all about the understanding and skills of the individual practitioners.”

Underestimated Effects

In 1995, the U.S. Center for Substance Abuse Prevention (CSAP) began a six-year, multi-site evaluation of its High Risk Youth Demonstration Grant Program, which from 1986 to 1996 funded more than 500 substance abuse prevention programs for youth ages 12 to 18. The highly rigorous evaluation of the demonstration included 48 programs, 5,934 youth participants and 4,539 nonparticipating youth from sites throughout the country.

When the evaluators initially analyzed outcome data for each program and pooled data from all 48, they found few significantly positive effects on participants’ use of tobacco, alcohol or marijuana when compared with the control group.

But Springer and Sambrano, the former branch chief for assessment and application at CSAP, knew those effects were just the most visible tip of a much larger iceberg. They had designed their evaluation to look deeper.

The evaluation employed “informed observers” who visited sites with a 200-page protocol that helped them collect multiple layers of data on participants, staff, community, and program design and implementation. It included self-report questionnaires administered to participants four times over a 2 1/2-year period, and it relied on the cataloged details of more than 217,000 contacts made with individual youth through the programs.

The evaluators broke down the data and statistically isolated the “real-world” differences among programs – such as prevention strategies, gender and substance use characteristics of participants, comparison group access to similar prevention programs in the communities and staff adherence to the program model. They discovered factors in the programs that were working against each other, nullifying the impacts.

For example: The HRY study collected data on the degree to which youth in comparison groups had the opportunity to participate in other prevention programs in their schools and communities. When sites that offered comparison youth a “high” opportunity to participate in other programs were removed from statistical models, HRY seemed to have an impact. That is, in “low opportunity” areas, the substance use trends for HRY participants showed significantly less growth than did the trends for comparison youth.

Similarly, programs that showed strong effects on one subgroup but not others showed low combined effects, which masked the benefit of the intervention for the subgroup that was affected.

Consider what happened when evaluators removed from the statistical model those youth who had not used any of the target substances in the past 30 days. Among the remaining youth, there was a significant difference in use among participants and the control group in areas with “low opportunity” for other such programs.

“By using some statistical adjustments and looking at effect sizes,” Springer said, evaluators can say that “prevention programs are probably a lot more effective than we are able to show” with standard research methods.

Design and Implementation

Although meta-analysis comparisons of prevention programs have long been able to identify certain intervention designs that are more effective than others, design differences alone could not explain all of the variations in program effects. The large HRY sample revealed that even similarly designed programs could have significantly different effects.

Springer and Sambrano believed their approach could uncover what made the difference.

The evaluators categorized each program’s prevention strategy as focused on behavioral skills, information delivery, recreation, personal affect (such as self-esteem), or a mix of strategies. They also categorized program delivery methods as focused on building connections or on self-reflective learning.

They used data about program implementation – including levels of staff training and the number of hours per week youth were involved – to categorize programs as having high or low model adherence, and high or low program intensity.

They found that when all of the programs’ characteristics were considered, the overall effect on 30-day substance use was not significant. But when they looked closer at individual programs and components, they found important differences.

Only five characteristics showed positive effects on their own: a focus on life skills, a connection-building delivery, a cohesive structure, an introspective-learning delivery and intense contact (more than 3.3 hours per week). The effect was greatest for programs that implemented at least four of those characteristics.

Programs “can learn from this monograph by looking at what we looked at,” Sambrano said. “What were those aspects that really made a difference? They can perhaps make sure that they are doing some of the things that proved effective in this evaluation.”

Do Models Work?

“The model program approach, in my mind, is demeaning to the individual practitioner, because it says ‘in order to have an effect, you’ve got to have a formula,’ ” Springer said.

According to Sambrano, despite trends to focus on the implementation of model programs, “it’s very difficult to implement [model programs] exactly as they were developed, because they were developed by researchers in a setting that was very specific for what they were trying to test.”

Trying to reproduce those conditions in the real world is “not practical,” she said.

While the model developers would like to see that their programs are implemented exactly as they intended, Sambrano said that programs “have to make adjustments based on their community, based on their participants. And that means that evaluation has to be reconfigured to look specifically at what that new program looks like.”

Contact: J. Fred Springer, EMT Associates Inc.,