Round-Up of Some Reactions to Experiments and Social Science
I spent some time speaking with Steve Pearlstein, who has written an excellent column for the Washington Post on experimentation that relates business experimentation to social scientific experimentation.
Will Wilkinson has a very thoughtful post on my article, in which he describes the difference between what he terms “liberty of discovery” and “liberty of respect”. I think these are somewhat like what I have called “liberty as means” and “liberty as goal”. He has what I think are some very smart things to say in it.
Andrew Sullivan says this in a post:
Jim ably dispatches some salient challenges. But there is a concept in this crucial conservative distinction between theoretical and practical wisdom that has been missing so far: individual judgment. A social change can never be proven in advance to be the right answer to a pressing problem. We can try to understand previous examples; we can examine large randomized trials; but in the end, we have to make a judgment about the timeliness and effectiveness of certain changes. It is the ability to sense when such a moment is ripe that we used to call statesmanship. It is that quality that no wonkery can ever replace.
It is why we elect people and not algorithms.
Just so, in my view.
In 1957, psychology and social science researcher Donald Campbell famously developed the idea of distinguishing between internal and external validity of an experiment:
“Validity can be evaluated in terms of two major criteria. First, and as a basic minimum, is what can be called internal validity: Did in fact the experimental stimulus make some significant difference in this specific instance? The second criterion is that of external validity, representativeness or generalizability: To what populations, settings or variables can this effect be generalized?
In the book, I term the problem of external validity to be one of one “predictive generalization”, i.e., answering the question of “What will happen if I execute policy X in the future?”, and distinguish this from what I call the problem of “strategic generalization”, i.e., answering the question of “Should I execute policy X?” I try to show why what Andrew says is inherently true – that as far as I can see, even in a situation in which normative concerns are not in play and we have agreement on goals, that there is no series of experiments or other analyses that can ever answer the second question in real-life situations.
But I think that the primary implication of this realization is to be very hesitant to take strategic leaps, and to do so only when other options appear to be foreclosed.
Mark Kleiman in the latest round of an exchange with me says:
But, if I read Manzi’s response correctly, my original comment allowed a merely verbal disagreement to exaggerate the extent of the underlying substantive disagreement. If indeed Manzi can offer some systematic analysis of how to look at existing institutions, figure out which ones might profitably be changed, try out a range of plausible changes, gather careful evidence about the results of those changes, and modify further in light of those results, then Manzi proposes what I would call a “scientific” approach to making public policy.
If all Manzi means when he disses “social science” is that you shouldn’t just read some random paper in an economics or social-psych journal and propose some insanely risky venture such as privatizing Social Security or voucherizing public education or wiping out labor unions based on that paper, then I’m happy to stand shoulder-to-shoulder with him against irresponsible radicalism and for cautious and evidence-sensitive approaches to bringing about social improvement.
I think that he is reading my response correctly. While I don’t think that “all I meant” was that “you shouldn’t read some random paper in an economics or social-pysch journal” and propose X, I certainly believe that. Most important, I acknowledge enthusiastically his “sauce for the goose is sauce for the gander” point that the recognition of our ignorance should apply to things that I theorize are good ideas, as much as it does to anything else. The law of unintended consequences does not only apply to Democratic proposals.
In fact, I have argued for supporting charter schools instead of school vouchers for exactly this reason. Even if one has the theory (as I do) that we ought to have a much more deregulated market for education, I more strongly hold the view that it is extremely difficult to predict the impacts of such drastic change, and that we should go one step at a time (even if on an experimental basis we are also testing more radical reforms at very small scale). I go into this in detail for the cases of school choice and social security privatization in the book.
Finally, Steve Sailer says this:
First, while experiments are great, correlation studies of naturally occurring data can be extremely useful. Second, a huge number of experiments have been done in the social sciences.
Third, the social sciences have come up with a vast amount of knowledge that is useful, reliable, and nonobvious, at least to our elites.
I claim that the purpose of science is to create useful, reliable, non-obvious predictive rules, and that experiments are a necessary but not sufficient component of this enterprise, as they are the most severe available test of reliability. So, a given correlation study might be “extremely useful”, but it does not eliminate the need for experimental tests of its assertions. I think that his first point implies that they are alternatives or substitutes; while I believe they are complementary.
I have attempted to collate every relevant, sufficiently large developed-world RFT reported in journals in the history of global social science. I don’t think I have been able to find every one, but I believe there have been, at most, a few thousand. In comparison, there have been something like 350,000 RCTs for therapeutics, and one company (Capital One) reportedly does more randomized field experiments per month than have been done in all of social science. More to the point, the number of findings of statistically significant positive results of social interventions that have been demonstrated in replicated RFTs appears to me to be tiny.
To Steve’s third point, my argument was not that we have not produced “knowledge” in some general sense, but that:
[F]ew programs can be shown to work in properly randomized and replicated trials. Despite complex and impressive-sounding empirical arguments by advocates and analysts, we should be very skeptical of claims for the effectiveness of new, counterintuitive programs and policies, and we should be reluctant to trump the trial-and-error process of social evolution in matters of economics or social policy.
Can this argument be placed in terms of technocratic experimentation which is associated with a statist approach vs spontaneous order in a free market? If so, I have some thoughts to add. If not, then I don’t have much to add. I’m sorry I haven’t had the time to read all you’ve written on this.
— Mike Farmer · Aug 4, 09:36 AM · #
Mike,
IMHO, it’s complicated. On one hand, there is some inherent conflict between claims to scientific knowledge and freedom (e.g., if we know that drug X won’t help cure the target disease and is dangerous, it seems so logical to make it illegal), but on the other hand certain kinds of freedom and openness seem to be required for science to operate. Not sure if that helped.
— Jim Manzi · Aug 4, 09:47 AM · #
“I claim that the purpose of science is to create useful, reliable, non-obvious predictive rules”
Chatting with my wife about all this over coffee this AM, and of course she asks the most insightful question: Is meteorology science? If so, why? If not, why not?
— Tony Comstock · Aug 4, 09:55 AM · #
“if we know that drug X won’t help cure the target disease and is dangerous, it seems so logical to make it illegal”
If “we” know this then why would we sell the drug or take the drug? If a company knowingly sells a drug that is dangerous and ineffective, wouldn’t they be open to a lawsuit, even if they could get someone to buy it. I understand the impulse to get government involved in oversight, but a part of the social science issue is considering what-ifs — what if society knew they had to research all drugs sold because there is no protection agency, just information on the market that can be accessed — and what if the druggist knew he could lose his/her business if he/she harms someone with dangerous, inefficient drugs — wouldn’t these two safeguards, with people taking responsibility over what they put in their bodies, and druggists protecting themselves from lawsuits, be enough? There’s also the fact that it’s hard to imagine a druggist who would intentionally harm a customer, for both moral and business reasons.
— Mike Farmer · Aug 4, 11:11 AM · #
One also has to wonder why a pharmaceutical company would go to the expense of manufacturing and marketing a dangerous, ineffective drug if no one will sell or buy it, and it also makes them liable for damages.
— Mike Farmer · Aug 4, 11:21 AM · #
Mike,
Yes, that’s the key point – what is knowledge and who gets to decide?
If you tell a physicist that heavy things fall faster than light things, she still knows you;re wrong, in the scientific meaning of the term “know”. How we act on that is another matter. I was very careful to say that it seems so logical to ban drug X.
— Jim Manzi · Aug 4, 11:41 AM · #
“drug X”
The name escapes me, but I’m sure at least some of you will remember the pharma-style ads for a “male enhancement product” that were running on TV a few years back.
The scam was remarkable in a number of ways, but with respect to this convo, what the scammer ended up going to jail for was fraud; not on the (in)efficacy of the product (there were no actual product claims) but on the money-back offer.
In this instance (and others) I find the after the fact approach unsatisfying, but as with Buzz Aldrin, it’s hard to make a rule, or even speculate on a prototype.
— Tony Comstock · Aug 4, 11:56 AM · #
“I find the after the fact approach unsatisfying”
As I find State, nanny-type preventative measures, anticipating violations or unhealthy, or unwise, choices, unsatisfying. The State determines that people will act this way, so they should prevent that by coercing them to act this way. The State doesn’t have enough knowledge, enough information, to control behavior in order to reach it’s preferred mode of behavior leading to the State’s prefered outcomes, nor enough wisdom to know if the preferred mode of behavior or outcomes are better than the free choices of the actors.
— Mike Farmer · Aug 4, 12:44 PM · #
Not to mention, the State does not have the right to do so. They may now have the power, but not the right.
— Mike Farmer · Aug 4, 12:47 PM · #
Why are school vouchers “insanely risky”? We have housing vouchers for poor people, because we figured out that it was better public policy to let poor people have some semblance of a choice rather than building big government-owned housing projects.
— Stuart Buck · Aug 4, 12:57 PM · #
Enzyte!
Mike, I’ve muddied my point by trying to be too clever.
As for “rights”, take it up with KVS. He’ll set you straight.
— Tony Comstock · Aug 4, 01:19 PM · #
Let’s take the discussion between Jim and me down from the plane of the abstract to the historical.
I wrote:
“Third, the social sciences have come up with a vast amount of knowledge that is useful, reliable, and nonobvious, at least to our elites.”
Jim responded:
“I claim that the purpose of science is to create useful, reliable, non-obvious predictive rules, and that experiments are a necessary but not sufficient component of this enterprise, as they are the most severe available test of reliability. So, a given correlation study might be “extremely useful”, but it does not eliminate the need for experimental tests of its assertions.”
Jim is responding as if I were saying that good correlation studies had found lots of programs that sound like they ought to be implemented, and should be implemented without experimentation. Actually, what I was referring to was the opposite: the once-famous series of correlation studies that showed the limits of potential social programs.
The history of social science findings in America from the Great Society onward has been, on the whole, one of liberal social scientists coming up with conservative, if not pessimistic or even reactionary, results. This is largely forgotten today now that “neoconservative” mostly refers to foreign policy views, but the term originated in large part to describe sophisticated social scientists such as Daniel Patrick Moynihan, James Q. Wilson, Edward Banfield, and so forth studies’ results didn’t support the optimistic spirit of the Sixties.
For example, a correlation study that was extremely famous to old codgers like me was the 1966 Coleman Report. The 1964 Civil Rights Act gave $1 million for a national state-of-the-art study of school achievement by race. James S. Coleman’s report two years later came as a major buzzkill for the Great Society spirit of the age. It was intended to show from existing variations in school spending that spending more on black students would close the gap in school achievement. To the dismay of everybody, Coleman found little evidence for that. Instead, student achievement correlated best with what students brought from home.
The one positive finding was that students benefited from higher IQ teachers, but since that would suggest, on net, firing black teachers and hiring white teachers in newly integrated schools, Coleman left that out, an omission that troubled his conscience until he confessed it in the 1990s.
Now, I would quite agree with Jim that, say, we shouldn’t take Coleman findings implying that we should fire lower IQ (and disproportionately black) teachers and replace them with higher IQ (and disproportionately white) teachers on faith without running several RFTs.
But, in general, good correlation studies over the last half century have not come up with a lot of suggestions for liberal social programs that look like they ought to work. Jim emphasizes how many liberal ideas have failed at the experimentation stage, but his important point should be augmented by noting how many liberal ideas have failed even before that, at the level of correlation studies.
— Steve Sailer · Aug 4, 03:41 PM · #
Yes, Steve, I agree this discussion, per se, is best argued on historical evidence,the limits of social science, and the pros and con of experimntation, but it matters if the experimentation is managed by government on subjects, or whether the experimentation is an organic process in a free society. In other words, freedom allows myriad experiments between voluntary actors which can lead to organic change, while government experimentation on subjects has its historical problems and doesn’t sound very appealing — the State has a way of tinkering, misreading the results and resisting feedback, whereas among voluntary social innovators there’s no political pressure to force results and if something doesn’t work, doesn’t catch on, is harmful, then it’s on to something else — much like the process in learning organizations — Peter Senge has done great work in this area, micromodeling and such, learning to trace effects back to original causes then breaking out of loops. Sorry about heading out into a different direction – I have ADD or something.
— Mike Farmer · Aug 4, 04:23 PM · #
Following up Mike’s point, look how the impetus in K-12 education is toward nationalization of methods. You would think that diversity of K-12 practices at the state and district level would be considered a godsend for helping figure out better practices. Yet, the opposite feeling dominates today.
How come?
One reason is that more glamorous politicians move up to the national level and then want to tinker in the traditionally local affair of K-12.
But a big reason is that all the districts and states have failed to eliminate the racial gaps. All of them.
So, rather than anybody in power or influence stopping and saying, well, maybe making our overwhelming goal in K-12 the elimination of racial achievement gaps, they say, “Failure at the state level proves we must nationalize the war on racial gaps! Every state has failed no matter what they tried, so we at the national level will try one thing and that will work!”
It’s lunatic logic, but that’s what makes you popular today.
— Steve Sailer · Aug 4, 05:01 PM · #
Jim,
I think you are confusing chaotic processes with lack of external validity. When social scientists get to engineer the entire system the predictions get pretty good, when physicists have to predict messy real world phenomena the predictions get bad.
See my treatment:
http://modeledbehavior.com/2010/08/04/jim-manzis-sensitive-dependence-on-initial-observations/
— Karl Smith · Aug 4, 05:04 PM · #
Civil penalties would be an insufficient deterrent. It’s too easy to make up some new “medicine” or “tonic”, rake in the dough from desperate people, pay out huge bonuses, then go bankrupt.
Criminal penalties—jail time—would be necessary to make the “after the fact” approach work.
It matters, but not that much. Even putting aside “animal spirits”, even assuming that everyone is acting rationally, people can still find themselves in prisoner’s dilemmas and tragedies of the commons—times when what’s best for each individual and what’s best for everyone diverge.
— Consumatopia · Aug 4, 05:57 PM · #
As Arnold Kling says, “robust results in social science that run counter to preferred techniques in social engineering get ignored; flimsy studies that support social engineering get a lot of play, only to be discredited later.”
The essential problem with the American social sciences isn’t methodological, it’s bigotry against realism.
There is very little money in telling the truth, but there is a huge risk of ostracism, so we don’t here much about reality.
Who makes more money lecturing corporations each year on what the social sciences have found: Charles Murray or Malcolm Gladwell?
— Steve Sailer · Aug 4, 06:42 PM · #
Karl,
I read the post, which I found very interesting. Thanks for the link.
In your comment, you say that:
I don’t think I’m confusing them, becasue I don’t think they’re alternatives. (As I’m sure you know, but for clarity) lack of external validity, in this context, just means that if I find that A appears to cause B in an experiment, it doesn’t mean that I will observe the same effect consistently if I attempt to replicate the experiment. In effect, I find “hidden conditionals” through replication. A chaotic system is one kind of environemnt in which I will often find this problem.
Fair enough, but such systems weren’t the subject of my article (although I do discuss laboratory experimental economics and its close relationship to RFTs in the book).
Claiming that physics has produced a large body of useful, reliable nonobvious predictive rules is not the same as saying that it has explained all physical phenomena.
— Jim Manzi · Aug 4, 07:26 PM · #
Any source for the Capital One claim?
— Roving Bandit · Aug 4, 07:40 PM · #
Small typo in the post: the company’s name is “Capital One”, not “Capitol One”
— Will Townes · Aug 4, 07:48 PM · #
Dear Jim:
When promoting your book on experimentation, I would recommend losing the emphasis on the stimulus. It’s distracting. It raises unanswerable questions and off-topic thoughts like, “How do you run an experiment on a $787 billion stimulus? How do you get it finished in time? Should you use, I don’t know, Maui as a laboratory for a $787 million stimulus? I’d volunteer to go help Maui spend $787 million!”
Education is a much better field for your emphasis on testing. The big emphasis in this century has been on closing the racial gap in education achievement, and that’s the exact opposite of responding to a semi-unique historical economic crisis.
In K-12 performance, instead of having a unique crash, we just have pretty much the same old same old, over and over and over: Asians do best, whites next, then Mexicans (if they speak English), then blacks. So, that’s much more amenable to testing than the stimulus.
But, here’s the reasons I can’t get too excited about your emphasis that we must run scientific experiments before we accept happy initial results: I just don’t see many happy initial results. And many of the ones I’ve seen hyped, I’ve dismantled just using Occam’s Razor.
For example, we already are testing informally by having thousands of different school districts, all of them with mandates from on high to Close The Gaps. Countless school districts put their test results online, and yet I’ve never heard of a school district where blacks are both numerous and do as well as whites. (Maybe there’s one out there somewhere. But you’d think you would hear about it if there were — there’s a lot of fame and money to be made by being the educator or social scientist who Closed the Gap.)
I’ve seen lots of claims that blacks in this or that single school are at the white statewide average, but that’s not the same as closing the racial gap, just as the black students at Harvard are a lot smarter on average than the national white average, but they aren’t as smart on average as the white students at Harvard. (See what I mean about Occam’s Razor?)
Your article would tend to give the impression that, say, we are constantly hearing about school districts where blacks are doing as well as whites, but then when we try to replicate that happy result in another school district, it turns out not to be replicable. But, in reality, we never even get that initial positive result. (For instance, for a long consideration of the performance of upscale black populations in prestigious suburban school districts, Google: Shaker Heights John Ogbu.)
In general, what I’ve noticed in following this field since 1972 is that what happens in racial gap research and funding is that naive newcomers, like Bill Gates or Roland Fryer, rush into the field on the assumption that obviously nobody smart has every really tried to Close the Gap before, so just through my own brilliance I’ll be able to it. After all, the alternatives are that either the Gap can’t be closed, or I’m not as smart as I think I am, and we know that neither alternative could possibly be true.
Then what happens is that they try their favorite ideas, they don’t work terribly well, the newcomers become depressed and go off and work on something different.
Overall, I just don’t see that the big problem with American social science is methodological. Instead, it’s religious. People want to believe in the evidence of things unseen. They don’t want to believe their lying eyes.
— Steve Sailer · Aug 4, 08:29 PM · #
Roving Bandit:
http://www.leader-values.com/Downloads/CBI/Journal_Issue_8.pdf
Will,
Thanks, I’ll correct it.
— Jim Manzi · Aug 4, 08:31 PM · #
Jim’s emphasis on experimentation could be useful in K-12 education, but there’s little enthusiasm for rigorous experimentation since the overriding goal of everybody who is anybody — Bill Gates, Barack Obama, George W. Bush, the late Ted Kennedy — has been to leave no child behind, to close the racial gap in school achievement. And, endless social science research over the last half century as repeatedly failed to achieve that.
Thus, the education research field is full of fads and frauds. Serious people flee it after a few years of failure to Close the Gap.
Basically, the Establishment program for education — Close the Gap — is the equivalent of an 18th Century Manhattan Project to invent the perpetual motion machine. I propose that instead of trying to invent the perpetual motion machine, we try to invent the steam engine. It wasn’t as cool as the perpetual machine in theory, but in reality it did more for people.
So what goal do I propose instead of Closing The Gap?
My goal would be to raise the average performance of all racial groups by half a standard deviation.
In other words, both goals are intended to improve the national average by half a standard deviation—but the Gates-Obama-Bush-Kennedy consensus wants to do it entirely by raising the scores of the minority half.
Which objective sounds more achievable?
Mine, obviously, for two reasons:
1. Diminishing marginal returns: a one standard deviation improvement is not merely twice as hard to accomplish as a half-standard deviation performance, it’s much harder. 2. Real improvements tend to better everybody’s performance. For example, I can drive a golf ball farther off the tee than I could 15 years ago because driver technology has significantly improved. (Clubheads are approaching the size of toasters, so you can now take a wild swipe at the ball without fear of whiffing). But then, Phil Mickelson can also hit the ball farther, too. So the pro-hacker gap in driving distance hasn’t closed.In summary: my aim is both more achievable, more fair, and more sensible than the Gates-Obama-Bush-Kennedy consensus.
And therefore, of course, it’s also much more unmentionable.
— Steve Sailer · Aug 4, 09:37 PM · #
Thank you for providing good information,Wholesale Electronics
— Wholesale Electronics · Aug 4, 11:21 PM · #
<em>One also has to wonder why a pharmaceutical company would go to the expense of manufacturing and marketing a dangerous, ineffective drug if no one will sell or buy it, and it also makes them liable for damages.</em>
Ever heard of ephedra? It would still be on the market if the FDA hadn’t banned it. People will do anything to lose weight, including losing their health. Since it was not technically a pharmaceutical, they didn’t have to do trials, the results of which would’ve been critical to any lawsuit.
— Derek Scruggs · Aug 5, 12:18 AM · #
“It would still be on the market if the FDA hadn’t banned it.”
And you know this how? In the absence of the FDA, and with a court system which responds to a society intolerant of harmful drugs put on the market (they would have to be to protect themselves), a company putting out such a drug would have to be suicidal to even try this type of thing. The earlier comment suggesting a company would go to the expense of manufacturing a harmful drug, that physicians would prescribe it, that drug stores would stock it all for a quick profit, is ludicrous. But aside from this, desperate people will try anything, FDA or no FDA. But just because government isn’t providing oversight doesn’t mean in a free market there wouldn’t be oversight. There would likely be an issue of trust involved with all new medicines, so there woudl be a need for a company such as the Joint Commission for Hospital Accedidation — drug stores would want a seal of approval to advertise in order to gain trust, and a way of showing the public it’s safe — the public would likely demand it — so drug companies would pay for a seal of appoval to market their safe, tested products. In fear of crippling lawsuits and the death of their reputation, companies would gladly pay for this seal of approval, especially if business taxes are non-existent, due to cutting the government down to a very inexpensive size that can supported through other indirect taxes.
— Mike Farmer · Aug 5, 12:54 AM · #
Mike, you are more or less describing the genesis and workings of the MPAA. I would be very interested to hear your take on movie regulation in the US verse other countries (most of which have government run and mandated regulatory regimes.)
— Tony Comstock · Aug 5, 08:51 AM · #
Mike,
I think the classic book that makes your basic argument (though in less aboslute terms – e.g., there is a trade-off between higher rates of “bad” drugs, and a faster rate of innovation of good drugs) is Sam Peltzman’s Regulation of Pharmaceutical Innovation (from memory). It was a key impetus for the late 1970s / 80s wave of FDA regulation changes. It is a very persuasive book.
— Jim Manzi · Aug 5, 10:37 AM · #
MPAA analogy works insofar as harmful images equals harmful drugs. The analogy weakens on the issue of expertise.
There is no expert method of censorship and ratings (at least not yet). Thus, the argument for MPAA is far weaker than the argument for FDA, even if we assume arguendo no distinction between physical and psychological harm.
Jim, did you get through that article yet? I’d be interested to hear how you strive with Quinean underdeterminism and cooperative naturalized epistemology.
— KVS · Aug 5, 01:49 PM · #
@KVS
Harm? What’s harm got to do with it?
— Tony Comstock · Aug 5, 03:22 PM · #
Tony, movie regulations, such as? Ratings? Restrictions on admission for R and X? I imagine the industry will police itself if it doesn’t want to upset those paying, the parents. A good question, though, is how much sex and violence is harmful to kids say from 13 to 16? At 13 there was not much that shocked me after sneaking into hoochy-koochy shows at state fairs, looking at smut magazines and living in an area of Atlanta that was pornographic by nature. But I’m not sure what you’re asking specifically — what if the state didn’t regulate the movie industry?
— Mike Farmer · Aug 5, 04:00 PM · #
Mike —
Perhaps I“m misunderstanding you, but your reply seems like a mismash of common misapprehensions about what the MPAA is and does.
Again, the MPAA operates in almost exactly the way you’ve described your theoretical drug industry self regulation scheme.
There is even an MPAA seal; in fact my own film “Marie and Jack: A Hardcore Love Story” carries an MPAA seal, and in exchange we are bound by certain rules about our packaging and advertising.
This standsvin contrast to most industrialized nation which have goverment mandated and run regulating bodies.
Given yout libertarian bona fides, I’d be interested in your take on why self-regulation took hold in the American film industry, but government regulation is the order of the day elsewhere; and then perhaps loop that back to why self-regulation has not been widely adopted for drugs.
— Tony Comstock · Aug 6, 09:13 AM · #
nice article, thanks for sharing this whit us!
if you want to buy<a href=“http://www.jerseysusa.com”>replica nfl jerseys</a>, just find me on my
website..
— replica jerseys · Aug 8, 09:46 PM · #
I’m late to this, I know, but doesn’t your take on “causal densities” bite back against arguments you’ve advanced in other areas? For example, I seem to remember you arguing that we should be suspicious of torture because torture-endorsing regimes generally lose wars against non-torturing states. I mean, talk about alternative causalities!
— Will · Aug 12, 10:19 AM · #