News

Detecting Danger: Can We do Better?

By Dick Mendel

Posted on September 1, 2008

For child welfare and juvenile justice workers, it can literally be a matter of life and death: predicting which new cases are dangerous and require urgent action, and which are less perilous and can be handled with a lighter touch.

The question is: How do you best assess whether a child should be removed from home, or whether a teen should be locked up?

Is it through structured risk-assessment check lists, which have increasingly replaced individual judgment and are trumpeted as the best practice? Or will a radically new approach make these predictions far more accurate?

The latter claim is being made by veteran youth scholar Ira Schwartz and his son David, a social worker and technology buff, who describe their model as “far more sophisticated, accurate, and predictive” than current methods.

Ira Schwartz: Seeks “far more sophisticated, accurate and predictive” risk assessment.

Photo: Temple University

A prominent skeptic charges that the Schwartzes are making unwarranted claims based on a small and “seriously oversold” body of research.

The dispute is impassioned and fraught with implications for youth work. Luckily, the question may soon be put to the test.

If the final details can be arranged, the Schwartzes’ new-fangled “neural network” risk prediction model will stand toe-to-toe in a supervised competition against the field’s longstanding standard-bearer, a statistically validated risk-assessment checklist championed by the National Council on Crime and Delinquency’s Children’s Research Center.

“Let’s test them out against one another,” says Aron Shlonsky, a respected child welfare scholar, who is unaffiliated with either approach and will referee the experiment. “Let’s have a fair fight and see what happens.”

State of the Art: Better, but Flawed

Despite the high stakes, risk assessment in child welfare and juvenile justice was until recently a matter of professional judgment, based on the experience, instincts, prejudices and intuition of front-line staff. Then a raft of studies found that structured, statistically derived methods consistently improve the accuracy of risk assessments when compared with informal clinical judgments.

“There aren’t too many things we can say with certainty in [the social work] field, but I think this is a done deal, tested over and over and over again,” Shlonsky says. “Statistically driven and actuarially validated risk-assessment tools are more accurate than risk predictions based on clinical judgment.”

Today, most child welfare agencies and state juvenile justice agencies employ some type of formal risk-assessment tool, as do many local juvenile courts and probation agencies. Typically, these assessment instruments are composed of a checklist of presumed risk factors tied to a point system that scores cases as high, medium, or low risk.

Nonetheless, risk assessment remains problematic. Methods used to develop and use the instruments vary widely. Despite overwhelming evidence that risk instruments developed through statistical analysis are more accurate, many agencies rely instead on intuition and consensus among professional staff. Risk instruments designed for one population are sometimes applied to another.

As Canadian scholars Della Knoke and Nico Trocmé described in a 2005 paper for the Oxford University journal, Brief Treatment and Crisis Intervention, “Most risk-assessment models [are] developed and implemented with little or no research to establish validity or reliability and with little, if any, empirical testing.”

Validation studies show that even the best of the current risk-assessment tools, while far better than unstructured personal judgments, aren’t very precise.

For instance, when Minnesota child welfare officials tested a brand new state-of-the-art actuarial risk-assessment tool in 2006, children in families deemed “high risk” were 5½ times as likely to suffer subsequent abuse as children in families deemed “low risk.” That sounds good, but officials also found this: More than half of the families in which children suffered harm had not been labeled high risk, and in most of the cases labeled high risk, there was no documented abuse.

According to Knoke and Trocmé, “Studies suggest that the predictive accuracy of risk assessment instruments is limited.”

An Intelligent Approach

David Schwartz came face to face with these limitations as a young social worker in Delaware in the mid-1990s. Fresh out of graduate school, Schwartz was working for a private child welfare agency, Child Inc., when he was assigned to work with a panel investigating the recent deaths of abused children.

As Schwartz delved into the science of risk assessment, he became intrigued with the potential of artificial intelligence, a new approach to computing that was showing great promise in sorting and analyzing large data sets. Unlike traditional computers, which operate from a single processor and follow pre-set rules and programs, artificial intelligence mirrors the human brain by using a connected set of simpler processors – known as a “neural network” – to recognize patterns and sort vast fields of data.

Neural networks and other artificial intelligence techniques are widely used. Credit card companies use neural networks for fraud detection, and insurance companies use them to flag fraudulent claims. The controversial “Total Information Awareness” anti-terrorism program introduced by the Bush administration in 2003 used artificial intelligence to mine data and identify potential terrorists.

Schwartz says that as he scanned the scene in 1996 and 1997, “Every field was applying these new methods to look at their data, even garbage collection. But not our data in the child welfare and human services field.”

The little bit of neural network research in child welfare and criminal justice was hardly promising. A 1996 National Institute of Justice study reported that neural networks did not show “any gains in accuracy” over traditional methods in predicting recidivism of adult offenders released from federal prisons.

David’s father, Ira Schwartz, was in position to help. A well-known youth scholar and former administrator of the U.S. Office of Juvenile Justice and Delinquency Prevention, the elder Schwartz was serving as dean of the University of Pennsylvania School of Social Work. He convened an interdisciplinary work group at the university to explore the potential for applying neural networks and other new computational techniques in the human services field.

(Ira Schwartz is now chairman of the board of the American Youth Work Center, which publishes Youth Today.)

Together, father and son launched what has turned out to be a quixotic, often frustrating and still unrealized quest to demonstrate clearly the effectiveness of neural networks and put them to use in youth work.

A Breakthrough?

The Schwartzes reached out to engineering professor Iraj Zandi, a long-time member of the University of Pennsylvania engineering faculty who was becoming interested in neural networks. They told him about a national public-use child welfare database, the National Incidence Survey of Child Abuse and Neglect, which held data on thousands of child protection cases from 42 U.S. counties. In 1998 Zandi assigned Adam Kaufman, a top student in his undergraduate senior design class, to begin working with the database.

With the Schwartzes’ help, Kaufman identified the variables that could be most relevant for determining risk. He then sifted through every case in the data set, weeding out those with mistakes or missing records, and reformatted the data for processing by a neural network. The result was a data set of 1,767 cases, each with 141 variables.

Using a rudimentary, off-the-shelf neural network software program, Kaufman began “training” that program on 1,150 of the cases by running the data set through it hundreds of times. This allowed the network to devise a risk prediction model and repeatedly reweight the algorithm to improve its accuracy. Then Kaufman tested the network on the final 617 cases in the data set.

At first blush, Kaufman’s results appeared impressive. The model correctly identified whether children had suffered harm in nearly 90 percent of the 617 test cases. The test yielded 16 inaccuracies (predicting harm where it didn’t happen or predicting no harm where it did), and found 48 cases too difficult to predict.

Thrilled by the results, the Schwartzes detailed the experiment in an article published in 2004 by the respected Child and Youth Services Review, which described “how more accurate and effective child welfare tools and instruments can be developed using [neural network] technology.”

Or a ‘Mickey Mouse’ Test?

However, that paper has been subject to harsh criticism. Zandi says he had not known before being contacted for this story that Kaufman’s experiment had been published in an academic journal, and expressed shock. “The network was crude, a student project,” he said. “A Mickey Mouse design.”

Also, the journal article reported only the network’s overall accuracy rate; it omitted data on how accurately it identified specifically those cases where abuse occurred. (The Schwartzes say the data are no longer available because the files have been lost.)

Finally, unlike most risk-assessment studies in child welfare, the Schwartz-Kaufman study sought to identify which cases had already experienced abuse, rather than the much more difficult and important challenge of predicting which cases will suffer future abuse.

After hearing David Schwartz give a presentation two years ago, Chris Baird of the National Council on Crime and Delinquency (NCCD) wrote a scathing critique. “The study involves a short-term prediction of an outcome that should be relatively easy to predict,” chastised Baird, who oversees the council’s work on risk prediction and has published several widely cited articles on actuarial risk prediction.

Because only 8 percent of cases within the data set had actually suffered harm, Baird argued that the neural network’s 90 percent accuracy rate was misleading. “We could ‘predict’ that no case would meet the harm standard and be right 92 percent of the time,” he wrote. “No added value is produced unless the model correctly identifies a large proportion of the few cases that are actually ‘positives.’ …

“We cannot determine if the model was successful in identifying the true positives … although, (based on what was presented) it seems unlikely.”

Zandi, the engineering professor, subsequently addressed some of the concerns on his own by analyzing the child welfare data with eight state-of-the-art neural network designs. One of them achieved an overall accuracy rate of 93 percent, and correctly identified 30 of the 33 cases in the data set where harm had occurred. It also incorrectly predicted harm in 27 cases.

Baird of NCCD remains dismissive, because Zandi also focused only on identifying cases where abuse had already occurred. “All they are doing is predicting … if you can call it that … the seriousness of an event that already occurred,” Baird says. “If you know just a few things about the report, such a prediction is relatively easy. I am absolutely positive that we could do that with an actuarial model and get results at least as good.”

Trying Again

For their next project, the Schwartzes assembled a stronger team and produced what even Baird concedes was “a much better study that looked at the kinds of things you should look at.”

The study examined risk prediction in juvenile justice, focusing on a database maintained by Peter Jones, a criminologist at Temple University. (Ira Schwartz became Temple’s provost in 2001.) The database included more than 8,000 Philadelphia youths who had been found delinquent from 2000 through 2002 and sentenced to a treatment program or correctional facility.

To design the neural network, the Schwartzes teamed up with artificial intelligence and data mining expert, Zoran Obradovic, who directs an information science research center at Temple. The study, published in the Temple Law Review in 2006, reports that the neural network predicted recidivism with eerie precision. Among cases that the model identified as low risk, the recidivism rate was just 3 percent, compared with an 81 percent rate for those designated high risk.

If it could be achieved at the front end of the juvenile process, such precision would be “a quantum leap forward” over existing risk-prediction methods, Baird says. But he remains skeptical, noting that the Schwartzes provided few details about the methods used in their study.

“I know of no field where such results have been produced, let alone human services, where data are notoriously unreliable and results need to [be] predicted well into the future,” Baird says.

Ira Schwartz countered by e-mail that Baird’s actuarial risk prediction studies “have NEVER, I repeat, NEVER, at least to my knowledge, ever generated as robust findings as our two studies.”

Head-to-Head

After publication of their juvenile justice study two years ago, the Schwartzes’ research on risk prediction in child welfare and juvenile justice slowed to a crawl – sidetracked by lack of funding and an inability to gain access to data from child welfare and juvenile justice providers. Baird and Zandi concur that funding for risk-prediction research has become increasingly scarce, both from federal research sponsors and federal agencies.

The Schwartzes have tried to forge partnerships with state and local child welfare and juvenile justice agencies, hoping to use their data to further test neural networks. No one has accepted. “The child welfare and juvenile justice fields just aren’t very much research-based,” Ira Schwartz says. “Social work is not an evidence-based profession.”

However, two recent events have brightened the outlook.

In July, New York State’s Office of Children and Family Services agreed to fund a project involving IBM and the Schwartzes (through a for-profit consulting firm they’ve created) to explore the potential of neural networks to improve risk prediction in child welfare.

An even bigger break came earlier this year, when a chapter written by the Schwartzes was included in a new book, Child Welfare Research: Advances for Policy and Practice, thus exposing the neural network risk prediction concept to a large new audience in academic and professional circles.

“The neural network approach holds a great deal of promise, though it remains relatively untested with respect to its effectiveness in child welfare settings,” says Shlonsky, the child welfare scholar, who is co-editor of the book. “There’s clearly enough there to validate further research.”

In his role as editor, Shlonsky shared a copy of the Schwartzes’ article with Baird at NCCD. In the back-and-forth that ensued, David Schwartz challenged Baird to test the neural network approach head-to-head against NCCD’s actuarial risk-prediction approach. After Shlonsky volunteered to coordinate the competition and ensure that both models operated under the same rules, Baird agreed.

“The process will take a while to unfold,” Shlonsky says, “but it looks like it’s going to happen. I’m very impressed that both parties are scientists. They’re both willing to test their methods head-to-head and live with the results.”

“I can only hope that the Schwartzes are right,” Shlonsky adds. “But when I see 97 percent accuracy, I want to know more. I’m a scientist. I’m trained to be skeptical.

“Let me see behind the curtain.”

Detecting Danger: Can We do Better?

Bowing to overwhelming evidence that structured risk assessment instruments outperform individual judgment in predicting the outcomes of child welfare and juvenile justice cases, a growing number of agencies in both fields now employ structured checklists to determine risk levels.

But that doesn’t mean they’re following best practice.

Scholars conclude that risk-assessment instruments may be of little value unless they’re developed and validated through statistical analysis, applied to the population for which they were intended and updated periodically to reflect changes in population.

In a new compilation of essays published by Oxford University Press (Child Welfare Research: Advances for Policy and Practice), Ira and David Schwartz and three colleagues complain that ill-conceived risk instruments “are not much better than a flip of the coin.” Worse, unless they are analyzed for racial impact, risk-assessment instruments may violate children’s and families’ rights by inadvertently injecting racial bias into the process.

In the same volume, Judith Rycus and Ronald Hughes at the North American Resource Center for Child Welfare note, “The tools used by many child welfare agencies to guide critical case decisions often demonstrate poor reliability and validity or have simply never been researched and tested prior to their implementation.”

“Most jurisdictions use some form of risk assessment, but many use unreliable instruments, with little or no predictive validity,” adds Aron Shlonsky, a leading child welfare scholar at the University of Toronto.

The Schwartzes, along with Temple University criminologist Peter Jones, proposed a series of quality-control measures to ensure the accuracy of risk-assessment instruments in child welfare and juvenile justice. “Maintaining the integrity of the risk-assessment process by identifying and requiring adherence to minimal standards of quality is imperative,” they wrote in a 2006 article in the Temple Law Review. “There is no time for delay.”

Among their proposals:

• All risk instruments should be developed through statistical analysis, tested for reliability at the time of development, then revalidated at least once every three years.

• A risk instrument developed for youth or families in one locality should not be applied in another without being revalidated for the new area’s population.

• A risk instrument developed for a specific population should not be applied to another population – whether different in terms of race, gender, age or prior history – even within the same locality, unless the instrument is tested and validated for the new population.

• A risk instrument should be required to identify possible gender or racial bias, to ensure that its predictive validity holds equally for all groups.

– Dick Mendel

Neural Networks: The Evidence

Neural networks offer the potential to produce “far more sophisticated, accurate, and predictive” risk-assessment tools than are now available, Ira and David Schwartz contend in their most recent paper on the subject.

Chris Baird of the National Council on Crime and Delinquency has questioned their research and argued, “There is just a lot more needed before anyone should conclude that neural networks produce better results” than conventional actuarial methods.

Here is what the small amount of current evidence says:

• The most compelling evidence for neural networks comes from a study the Schwartzes conducted, examining juvenile recidivism. In that study, youth identified by a neural network prediction model as high risk recorded an actual re-offending rate of 81 percent, while those deemed low risk had a rate of just 3 percent.

The study boasted, correctly, that this level of predictive accuracy far exceeds anything available through current methods.

However, these results also far exceed anything documented in other research on neural networks. “If this level of predictive accuracy were attainable through neural networks,” Baird says, “it would bring some industries (sports betting comes to mind) to a screeching halt.” And Baird complains that the Schwartzes have provided few details on their methods. “How did they do it?” he asks. “What factors did they use?”

• The Schwartzes’ other study – examining a neural network’s ability to determine child welfare cases in which children had “sustained harm” – was plagued by methodological limitations, and other research has not produced much grounds for optimism.

• A study in the state of Washington in 2000 found that neural networks produced superior results to other methods in predicting risk in child welfare cases. In a 2003 study examining youth released from a Nebraska training school, a neural network accurately predicted recidivism in 74 percent of cases – far better than conventional approaches. That study included only 166 youths.

• Other research points in the opposite direction. In 1996, a National Institute of Justice paper examining recidivism among adult offenders did not show “any gains in accuracy by using neural networks to predict recidivism.” A 2003 study of child protection cases among U.S. Air Force families found that neural networks were slightly less accurate in predicting case outcomes than were conventional statistical techniques.

– Dick Mendel

Resources

Chris Baird, Executive Vice President

Children’s Research Center

Madison, Wis.

(608) 831-8882

CBaird@mw.nccd-crc.org

David Schwartz, CEO

Q-linx

Conshohocken, Pa.

(610) 733-7140

dschwartz@qlinx.com