“Don’t Trust One-Offs”

Jim Manzi in City Journal:

[…]

Another way of putting the problem is that we have no reliable way to measure counterfactuals—that is, to know what would have happened had we not executed some policy—because so many other factors influence the outcome. This seemingly narrow problem is central to our continuing inability to transform social sciences into actual sciences. Unlike physics or biology, the social sciences have not demonstrated the capacity to produce a substantial body of useful, nonobvious, and reliable predictive rules about what they study—that is, human social behavior, including the impact of proposed government programs.

The missing ingredient is controlled experimentation, which is what allows science positively to settle certain kinds of debates. How do we know that our physical theories concerning the wing are true? In the end, not because of equations on blackboards or compelling speeches by famous physicists but because airplanes stay up. Social scientists may make claims as fascinating and counterintuitive as the proposition that a heavy piece of machinery can fly, but these claims are frequently untested by experiment, which means that debates like the one in 2009 will never be settled. For decades to come, we will continue to be lectured by what are, in effect, Keynesian and non-Keynesian economists.

Over many decades, social science has groped toward the goal of applying the experimental method to evaluate its theories for social improvement. Recent developments have made this much more practical, and the experimental revolution is finally reaching social science. The most fundamental lesson that emerges from such experimentation to date is that our scientific ignorance of the human condition remains profound. Despite confidently asserted empirical analysis, persuasive rhetoric, and claims to expertise, very few social-program interventions can be shown in controlled experiments to create real improvement in outcomes of interest.

[…]

After reviewing experiments not just in criminology but also in welfare-program design, education, and other fields, I propose that three lessons emerge consistently from them.

First, few programs can be shown to work in properly randomized and replicated trials. Despite complex and impressive-sounding empirical arguments by advocates and analysts, we should be very skeptical of claims for the effectiveness of new, counterintuitive programs and policies, and we should be reluctant to trump the trial-and-error process of social evolution in matters of economics or social policy.

Second, within this universe of programs that are far more likely to fail than succeed, programs that try to change people are even more likely to fail than those that try to change incentives. A litany of program ideas designed to push welfare recipients into the workforce failed when tested in those randomized experiments of the welfare-reform era; only adding mandatory work requirements succeeded in moving people from welfare to work in a humane fashion. And mandatory work-requirement programs that emphasize just getting a job are far more effective than those that emphasize skills-building. Similarly, the list of failed attempts to change people to make them less likely to commit crimes is almost endless—prisoner counseling, transitional aid to prisoners, intensive probation, juvenile boot camps—but the only program concept that tentatively demonstrated reductions in crime rates in replicated RFTs was nuisance abatement, which changes the environment in which criminals operate. (This isn’t to say that direct behavior-improvement programs can never work; one well-known program that sends nurses to visit new or expectant mothers seems to have succeeded in improving various social outcomes in replicated independent RFTs.)

And third, there is no magic. Those rare programs that do work usually lead to improvements that are quite modest, compared with the size of the problems they are meant to address or the dreams of advocates.

Razib Khan at Discover Magazine:

A friend once observed that you can’t have engineering without science, making the whole concept of “social engineering” somewhat farcical. Jim Manzi has an article in City Journal which reviews the checkered history of scientific methods as applied to humanity, What Social Science Does—and Doesn’t—Know: Our scientific ignorance of the human condition remains profound.

The criticisms of a scientific program as applied to humanity are deep, and two pronged. As Manzi notes the “causal density” of human phenomena make teasing causation from correlation very difficult. Additionally, the large scale and humanistic nature of social phenomena make them ethically and practically impossible to apply methods of scientific experimentation. This is why social scientists look for “natural experiments,” or involve extrapolation from “WEIRD” subject pools. But as Manzi notes many of the correlations themselves are highly context sensitive and not amenable to replication.

Arnold Kling:

If David Brooks is going to give out his annual awards for most important essays, I would nominate this one.

One of the lessons that is implicit in the essay (and that I think that Manzi ought to make explicit) is, “Don’t trust one-offs.” That is, do not draw strong conclusions based on a single experiment, no matter how well constructed. Instead, wait until many experiments have been conducted, in a variety of settings and using a variety of techniques. An example of a one-off that generated a lot of recent excitement is the $320,000 kindergarten teacher study.

Mark Kleiman:

I’m sorry, but this is incoherent. What is this magical “trial-and-error process” that does what scientific inquiry can’t do? On what basis are we to determine whether a given trial led to successful or unsuccessful results? Uncontrolled before-and-after analysis, with its vulnerability to regression toward the mean? And where is the mystical “social evolution” that somehow leads fit policies to survive while killing off the unfit?

Without any social-scientific basis at all (unless you count Gary Becker’s speculations) we managed to expand incarceration by 500 percent between 1975 and the present. Is that fact – the resultant of a complicated interplay of political, bureaucratic, and professional forces – to be accepted as evidence that mass incarceration is a good policy, and the “counter-intuitive” finding that, past a given point, expanding incarceration tends, on balance, to increase crime be ignored because it’s merely social science? Should the widespread belief, implemented in policy, that only formal treatment cures substance abuse cause us to ignore the evidence to the contrary provided by both naturalistic studies and the finding of the HOPE randomized controlled trial that consistent sanctions can reliably extinguish drug-using behavior even among chronic criminally-active substance abusers?

For some reason he doesn’t specify, Manzi regards negative trial results as dispositive evidence that social innovators are silly people who don’t understand “causal density.” So he accepts – as well he should – the “counter-intuitive” result that juvenile boot camps were a bad idea. But why are those negative results so much more impressive than the finding that raising offenders’ reading scores tends to reduce their future criminality?

Surely Manzi is right to call for metholological humility and catholicism; social knowledge does not begin and end with regressions and controlled trials. But the notion that prejudices embedded in policies reflect some sort of evolutionary result, and therefore deserve our respect when they conflict with the results of careful study, really can’t be taken seriously.

Manzi responds at The American Scene:

This leads Kleiman to ask:

What is this magical “trial-and-error process” that does what scientific inquiry can’t do? On what basis are we to determine whether a given trial led to successful or unsuccessful results? Uncontrolled before-and-after analysis, with its vulnerability to regression toward the mean? And where is the mystical “social evolution” that somehow leads fit policies to survive while killing off the unfit?

I devoted a lot of time to this related group of questions in the forthcoming book. The shortest answer is that social evolution does not allow us to draw rational conclusions with scientific provenance about the effectiveness of various interventions, for methodological reasons including those that Kleiman cites. Social evolution merely renders (metaphorical) judgments about packages of policy decisions as embedded in actual institutions. This process is glacial, statistical and crude, and we live in the midst of an evolutionary stream that we don’t comprehend. But recognition of ignorance is superior to the unfounded assertion of scientific knowledge.

Kleiman then goes on to ask this:

Without any social-scientific basis at all (unless you count Gary Becker’s speculations) we managed to expand incarceration by 500 percent between 1975 and the present. Is that fact – the resultant of a complicated interplay of political, bureaucratic, and professional forces – to be accepted as evidence that mass incarceration is a good policy, and the “counter-intuitive” finding that, past a given point, expanding incarceration tends, on balance, to increase crime be ignored because it’s merely social science?

My answer is yes, it should be counted as evidence, but that it is not close to dispositive. We can not glibly conclude that we now live in the best of all possible worlds. I devoted several chapters to trying to lay out some principles for evaluating when, why and how we should consider, initiate and retrospectively evaluate reforms to our social institutions.

Kleiman’s last question is:

Should the widespread belief, implemented in policy, that only formal treatment cures substance abuse cause us to ignore the evidence to the contrary provided by both naturalistic studies and the finding of the HOPE RCT that consistent sanctions can reliably extinguish drug-using behavior even among chronic criminally-active substance abusers?

My answer to this is no, and a large fraction of the article (and the book) is devoted to making the case that exactly such randomized trials really are the gold standard for the kind of knowledge that is required to make reliable, non-obvious predictions that rationally outweigh settled practice and even common sense. The major caveat to the evaluation of this specific program (about which Kleiman is deeply expert) is whether or not the experiment has been replicated, as I also make the argument that replication is essential to drawing valid conclusions from such experiments – the principle that Arnold Kling called in a review of the article, “Don’t trust one-offs.”

Steven Pearlstein at WaPo

Steve Sailer:

That all sounds plausible, but I’ve been a social science stats geek since 1972, when the high school debate topic that year was education, so I’m aware that Manzi’s implications are misleading.

First, while experiments are great, correlation studies of naturally occurring data can be extremely useful. Second, a huge number of experiments have been done in the social sciences.

Third, the social sciences have come up with a vast amount of knowledge that is useful, reliable, and nonobvious, at least to our elites.

For example, a few years, Mayor Bloomberg and NYC schools supremo Joel Klein decided to fix the ramshackle admissions process to the gifted schools by imposing a standardized test on all applicants. Blogger Half Sigma immediately predicted that the percentage of Asians and whites admitted would rise at the expense of blacks and Hispanics, which would cause a sizable unexpected political problem for Bloomberg and Klein. All that has come to pass.

This inevitable outcome should have been obvious to Bloomberg and Klein from a century of social science data accumulation, but it clearly was not obvious to them.

No, the biggest problem with social science research is not methodological; it’s that we just don’t like the findings. The elites of America don’t like what the social sciences have uncovered about, say, crime, education, discrimination, immigration, and so forth.

Andrew Sullivan:

But there is a concept in this crucial conservative distinction between theoretical and practical wisdom that has been missing so far: individual judgment. A social change can never be proven in advance to be the right answer to a pressing problem. We can try to understand previous examples; we can examine large randomized trials; but in the end, we have to make a judgment about the timeliness and effectiveness of certain changes. It is the ability to sense when such a moment is ripe that we used to call statesmanship. It is that quality that no wonkery can ever replace.

It is why we elect people and not algorithms.

Will Wilkinson:

In my thinking about the contrasts between Rawlsian and Hayekian liberalism, I’ve begun to think about the former as the “liberalism of respect” and the latter as the “liberalism of discovery.” The liberalism of discovery recognizes the pervasiveness of our ignorance and the necessity of liberty for the emergence of useful knowledge. I would argue that the ideal of a social order embodying respect for persons as free and equal–the ideal of the liberalism of respect–comes to seem appealing only after a society has attained a certain level of economic development and general education, and these are largely consequences of a prior history of the relatively free play of the mechanisms of discovery celebrated by liberals like Hayek and Jim. But liberals of respect have tended to overlook the conditions under which people come to find the their favored ideal worth aspiring to, and so have tended to fail to acknowledge in their theories of justice the role of the institutions of discovery in creating and maintaining a society of mutual respect and fair reciprocity.

Via Sullivan, Kleiman responds to Manzi:

I suppose I’ll have to read Manzi’s book to find out how existing practices constitute “(metaphorical) judgments about packages of policy decisions;” I’m inclined to regard them as mostly mere resultants-of-forces, with little claim to deference. (Thinking that existing arrangements somehow embody tacit knowledge is a different matter from thinking that big changes are likely to have unexpected consequences, mostly bad, though both are arguments for caution about grand projects.)

I’m also less unimpressed than Manzi is with how much non-obvious stuff about humans living together the social sciences have already taught us. That supply and demand will, without regulation, come into equilibrium at some price was a dazzling and radical social-scientific claim when Adam Smith and his friends suggested it. So too for Ricardo’s analysis of comparative advantage, which, while it doesn’t fully support the free-trade religion that has grown up around it, at least creates a reasonable presumption that trade is welfare-increasing.

The superiority of reward to punishment in changing behavior; the importance of cognitive-dissonance and mean-regression effects in (mis)shaping individual and social judgments; the intractable problem of public-goods contributions; the importance of social capital; the problems created by asymmetric information and the signaling processes it supports; the crucial importance of focal points; the distinction between positive-feedback and negative-feedback processes; the distinction between zero-sum and variable-sum games; the pervasiveness of imperfect rationality in the treatment of risk and of time-value, and the consequent possibility that people will, indeed, damage themselves voluntarily: none of these was obvious when proposed, and all of them are now, I claim, sufficiently well-established to allow us to make policy choices based on them, with some confidence about likely results. (So, for that matter, is the Keynesian analysis of insufficient demand and what to do about it.)

But, if I read Manzi’s response correctly, my original comment allowed a merely verbal disagreement to exaggerate the extent of the underlying substantive disagreement. If indeed Manzi can offer some systematic analysis of how to look at existing institutions, figure out which ones might profitably be changed, try out a range of plausible changes, gather careful evidence about the results of those changes, and modify further in light of those results, then Manzi proposes what I would call a “scientific” approach to making public policy.

Manzi responds to Kleiman:

I think that he is reading my response correctly. While I don’t think that “all I meant” was that “you shouldn’t read some random paper in an economics or social-pysch journal” and propose X, I certainly believe that. Most important, I acknowledge enthusiastically his “sauce for the goose is sauce for the gander” point that the recognition of our ignorance should apply to things that I theorize are good ideas, as much as it does to anything else. The law of unintended consequences does not only apply to Democratic proposals.

In fact, I have argued for supporting charter schools instead of school vouchers for exactly this reason. Even if one has the theory (as I do) that we ought to have a much more deregulated market for education, I more strongly hold the view that it is extremely difficult to predict the impacts of such drastic change, and that we should go one step at a time (even if on an experimental basis we are also testing more radical reforms at very small scale). I go into this in detail for the cases of school choice and social security privatization in the book.

Megan McArdle:

I have been reading with great interest the back-and-forth between Mark Kleiman and Jim Manzi on how much more humble we ought to be about new policy changes.  I know and like both men personally, as well as having a healthy respect for two formidable intellects, so I’ve greatly enjoyed the exchange.

Naturally, this has put me in mind of just how hard it is to predict policy outcomes–how easy it is to settle on some intuitively plausible outcome, without considering some harder-to-imagine countervailing force.

Consider the supply-siders.  The thing is intuitively appealling; when we get more money from working, we ought to be willing to.  And it is a mathematical truism that revenue must maximize at some point.  Why couldn’t we be on the right-hand side of the Laffer Curve?

It was entirely possible that we were; unfortunately, it wasn’t true.  And one of the reasons that supply-siders failed was that they were captivated by that one appealing intuition.  In economics, it’s known as the “substitution effect”–as your wages go up, leisure becomes relatively more expensive relative to work, so you tend to do less of the former, more of the latter.

Unfortunately, the supply-siders missed another important effect, known as the “income effect”.  Which is to say that as you get richer, you demand more of some goods, and less of others.  And one of the goods you demand more of as you get richer–a class of goods known as “superior goods”–is leisure.

Of course, some people are so driven that they will simply work until they drop in the traces.  But most people like leisure.  So say you raise the average wage by 10%.  Suddenly people are bringing home 10% more income every hour.  Now, maybe this makes them all excited so they decide to work more.  On the other hand, maybe they decide they were happy at their old income, and now they can enjoy their old income while working 9% fewer hours.  Cutting taxes could actually reduce total output.

(We will not go into the question of how much most people can control their hours–on the one hand, most people can’t, very well, but on the other hand, those who can tend to be the high-earning types who pay most of your taxes.)

Which happens depends on which effect is stronger.  In practice, apparently neither was strong enough to thoroughly dominate, at least not when combined with employers who still demanded 40 hour weeks.  You do probably get a modest boost to GDP from tax cuts.  But you also get falling tax revenue.

Naturally, even-handedness demands that I here expose the wrong-headedness of some liberal scheme.  And as it happens, I have one all ready in the oven here:  the chimera of reducing emergency room use.  The argument that health care reform could somehow at least partially pay for itself by keeping people from using the emergency room was always dubious.  As I, and others argued, there’s not actually that much evidence that people use the emergency room because they are uninsured–rather than because they have to work during normal business hours, are poor planners, or are afraid that immigration may somehow find them at a free clinic.

Moreover, we argued, non-emergent visits to the emergency room mostly use the spare capacity of trauma doctors; the average cost may be hundreds of dollars, but the marginal cost of slotting ear infections in when you don’t happen to have a sucking chest wound, is probably pretty minimal.

But even I was not skeptical enough to predict what actually happened in Massachusetts, which is that emergency room usage went up after they implemented health care reform.

Leave a comment

Filed under Go Meta

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s