Wednesday, February 3, 2016

Multiple Regression Analysis

To get my degree from the University of Minnesota, I was required to take four quarters of upper class and graduate-level statistics.  I was NOT happy about this requirement but it only took a couple of classes for me to change my mind from "Do I hafta?" to "This is the future of knowledge expansion—this is probably the most import subject I will take."  So what changed my mind?

The absolute pinnacle of the social sciences at UM was the psychology department.  In 1939, it produced a testing device called the Minnesota Multiphasic Personality Inventory (MMPI) which was soon put to use by the War Department.  It is still in use and copyrighted by the University of Minnesota.  What made the MMPI groundbreaking was that it was the first big test that had used a math method in its design.  That method was called Multiple Regression Analysis.  It was very tedious and labor intensive to use, but a small army of diligent grad students wielding slide rules had been employed to produce the original version of the MMPI.

By the time I got to my stats class in 1972, they still taught the wisdom and power of Multiple Regression Analysis only by now, the social sciences had their very own IBM 360 so no more doing this tedious math by hand.  I was caught up in the excitement.  My teachers not only believed in Multiple Regression Analysis, they believed that using high-powered computing, they were on the verge of a new golden age of human knowledge.  It also helped that for a mere $5 lab fee, I was able to play with one of the most advanced computers on earth.

Along the way, I began to see folks abusing Multiple Regression Analysis.  They would ask almost any questions they could imagine was remotely relevant, run the math, and circle the high statistical correlations on the printout.  Only then would they look to see what had influenced what.  When I saw that happen, I became even more interested in learning statistical methods because it was quite apparent that knowing how stats are generated is perhaps the most important skill one could have in an era where stats would show up in almost every important science and public policy debate.

So now I see a really smart guy point out the abuses of Multiple Regression Analysis and suggest that studies that use this method are probably so flawed, they should come with warning labels.  There is absolutely nothing in my education that would make me believe he is wrong.  In fact, I have long suspected that even the "sainted" MMPI was garbage.  This is a method that probably should have been strangled in its cradle.

The Crusade Against Multiple Regression Analysis

A Conversation With Richard Nisbett [1.21.16]
A huge range of science projects are done with multiple regression analysis. The results are often somewhere between meaningless and quite damaging. ...
I hope that in the future, if I’m successful in communicating with people about this, that there’ll be a kind of upfront warning in New York Times articles: These data are based on multiple regression analysis. This would be a sign that you probably shouldn’t read the article because you’re quite likely to get non-information or misinformation.

RICHARD NISBETT is a professor of psychology and co-director of the Culture and Cognition Program at the University of Michigan. He is the author of Mindware: Tools for Smart Thinking; and The Geography of Thought. Richard Nisbett's Edge Bio Page.


The thing I’m most interested in right now has become a kind of crusade against correlational statistical analysis—in particular, what’s called multiple regression analysis. Say you want to find out whether taking Vitamin E is associated with lower prostate cancer risk. You look at the correlational evidence and indeed it turns out that men who take Vitamin E have lower risk for prostate cancer. Then someone says, "Well, let’s see if we do the actual experiment, what happens." And what happens when you do the experiment is that Vitamin E contributes to the likelihood of prostate cancer. How could there be differences? These happen a lot. The correlational—the observational—evidence tells you one thing, the experimental evidence tells you something completely different.

In the case of health data, the big problem is something that’s come to be called the healthy user bias, because the guy who’s taking Vitamin E is also doing everything else right. A doctor or an article has told him to take Vitamin E, so he does that, but he’s also the guy who’s watching his weight and his cholesterol, gets plenty of exercise, drinks alcohol in moderation, doesn’t smoke, has a high level of education, and a high income. All of these things are likely to make you live longer, to make you less subject to morbidity and mortality risks of all kinds. You pull one thing out of that correlate and it’s going to look like Vitamin E is terrific because it’s dragging all these other good things along with it.

This is not, by any means, limited to health issues. A while back, I read a government report in The New York Times on the safety of automobiles. The measure that they used was the deaths per million drivers of each of these autos. It turns out that, for example, there are enormously more deaths per million drivers who drive Ford F150 pickups than for people who drive Volvo station wagons. Most people’s reaction, and certainly my initial reaction to it was, "Well, it sort of figures—everybody knows that Volvos are safe."

Let’s describe two people and you tell me who you think is more likely to be driving the Volvo and who is more likely to be driving the pickup: a suburban matron in the New York area and a twenty-five-year-old cowboy in Oklahoma. It’s obvious that people are not assigned their cars. We don’t say, "Billy, you’ll be driving a powder blue Volvo station wagon." Because of this self-selection problem, you simply can’t interpret data like that. You know virtually nothing about the relative safety of cars based on that study.

I saw in The New York Times recently an article by a respected writer reporting that people who have elaborate weddings tend to have marriages that last longer. How would that be? Maybe it’s just all the darned expense and bother—you don’t want to get divorced. It’s a cognitive dissonance thing.

Let’s think about who makes elaborate plans for expensive weddings: people who are better off financially, which is by itself a good prognosis for marriage; people who are more educated, also a better prognosis; people who are richer; people who are older—the later you get married, the more likelihood that the marriage will last, and so on.

The truth is you’ve learned nothing. It’s like saying men who are a somebody III or IV have longer-lasting marriages. Is it because of the suffix there? No, it’s because those people are the types who have a good prognosis for a lengthy marriage.

A huge range of science projects are done with multiple regression analysis. The results are often somewhere between meaningless and quite damaging.

I find that my fellow social psychologists, the very smartest ones, will do these silly multiple regression studies, showing, for example, that the more basketball team members touch each other the better the record of wins.

I hope that in the future, if I’m successful in communicating with people about this, there’ll be a kind of upfront warning in New York Times articles: These data are based on multiple regression analysis. This would be a sign that you probably shouldn’t read the article because you’re quite likely to get non-information or misinformation.

Knowing that the technique is terribly flawed and asking yourself—which you shouldn’t have to do because you ought to be told by the journalist what generated these data—if the study is subject to self-selection effects or confounded variable effects, and if it is, you should probably ignore them. What I most want to do is blow the whistle on this and stop scientists from doing this kind of thing. As I say, many of the very best social psychologists don’t understand this point.

I want to do an article that will describe, similar to the way I have done now, what the problem is. I’m going to work with a statistician who can do all the formal stuff, and hopefully we’ll be published in some outlet that will reach scientists in all fields and also act as a kind of "buyer beware" for the general reader, so they understand when a technique is deeply flawed and can be alert to the possibility that the study they're reading has the self-selection or confounded-variable problems that are characteristic of multiple regression.

Health statistics in general, you should be extremely dubious about, unless it’s explicitly stated that it’s an experimental study. The consequences of this junk research are enormous. I’m trying to find ways to get people to stop doing it and to make the general reader aware that they have to ask themselves, "Do I think that this is a correlational study or is it an actual experiment?"

At the beginning, social psychology was about attitudes and groups. Those were the main things that we studied for a decade or two. About the beginning of the cognitive revolution, people who called themselves cognitive scientists weren’t really dealing with human cognition; they were dealing with memory and perception and so on—hardnosed type of stuff. So, there was this field—the study of how people think—that had been abandoned. Cognitive social psychology became dominant in the field—how people think, how they reason, how they make sense of other people.

About the same time that social psychology was moving cognitive, it was also moving into the territory occupied by personality psychologists and was basically saying that the claims by personality psychologists of consistent powerful individual differences are simply mistaken. The great breaking point in all of this was actually not by a social psychologist; it was by Walter Mischel, who was a clinical psychologist. He pointed out how very weak our predictions can be, even from having, let’s say, a personality profile of someone based on lots of questions (how extroverted is Joe versus Jim?). The predictability runs, at most, to a correlation of about .3, which is not a very strong relationship at all. At the level of one situation to another, how extroverted is Bill in Situation A and how extroverted is he in Situation B—that correlation runs about .1, which means it’s about a 54 percent chance that you would get it right as to whether the more extroverted person in Situation 1 would be the more extroverted person in Situation 2. That's up from the 50 percent you get with a coin flip.

That was a huge battle. Social psychologists took up the Mischel position. We confronted the personality view, which is primarily the way that the layperson understands human behavior, by saying, "No, it’s primarily situational factors. It’s context. And it’s the cognitive interpretation of the context you find yourself in that’s driving behavior."

For a decade, many social psychologists were focusing on the power of the situation—trivial seeming things that turned out to be completely determinative of a person’s behavior. Of late, some of the situational factors that people examine are even more trivial. They make us look silly. For example, if I ask you to read a persuasive communication while you’re standing on a wharf or in a seafood store smelling fish, you’re less persuaded by the persuasive communication because it seems "fishy" to you. This, as it turns out, only works in cultures where there is the expression, "This looks fishy to me." It doesn’t work in Denmark, where the expression is, "I smell a rat."

There is something of a crisis in science in general with the question of replicability; there are claims that seem to be impossible. 85 percent of all medical experiments don’t replicate. Take the pill, don’t take the pill and it doesn’t replicate? I don’t understand that.

In my field there’s the claim by the people who have been doing the replication to find out exactly how bad the problem in social psychology is. They find that two-thirds of the studies don’t replicate. Well, I’m completely baffled by that. I don’t know what it means. But I’d note first of all that a lot of the studies they looked at were these studies that have some trivial-seeming manipulation; it’s embarrassing that it would have an effect on us, but you would expect them to be somewhat unstable.

Take an example of one that seems not to be very highly replicable—the granddaddy of them all by John Bargh: Students hear the words "cane," "Florida," "gray," and then they walk more slowly out of a laboratory. That sometimes replicates and sometimes doesn’t. But the whole point about a trivial fleeting stimulus that might be powerful is that, in a slightly different context, or when the person’s attention is directed slightly differently, you may not get the effect.

I challenged my colleagues—and I’ve done this to two dozen now—I said, "Tell me a study in social psychology that, when it came out, seemed interesting and important and then it didn’t turn out to replicate."

Nobody gives you one just right off the top of their heads. They'd say, "Well, how about Daryl Bem’s truth light/lie light study?" That apparently doesn’t replicate, but it seemed interesting and important.

We now have dozens of experiments that make the same general point—that you pick up a cue, in this case, you’ve been telling lies or telling the truth depending on what color light is on. When the experiment is over and you’re asked to express an attitude and one of those lights is on you tend not to be as confident about what you’re saying if it was the "lie light" that was on.

That, as I say, doesn’t replicate, but the general point is that some minimal seeming cue, which gets into the stream of behavior and has an impact—we have hundreds of those studies, many individual ones that are not highly replicable. Physiologists sometimes will go to somebody else’s lab for months to find out how you get the preparation right so that you can find some effect.

I have a text in social psychology and when this replication concern came out, I just started thumbing through the text and seeing how many of the assertions we made about human behavior in that text rest on a single experiment. Nothing. We may sometimes describe a particular experiment as an example of some point, and that particular experiment might not replicate, but the theory that the experiment exemplifies has been established in any number of different experimental contexts.

There’s tremendous redundancy in what we’re telling the public about human behavior. Though it may not be the case that you can tell a bunch of college students words that remind them of old folks and they’ll be walking more slowly. There are many studies making a point similar to that.

Take an example of a cute study—I don’t know whether it replicates or not but it wouldn’t be surprising if it didn’t. You put three dots over the coffee urn and the dots are in the position two up, one down, which is vaguely reminiscent of the human face: two eyes and a mouth. You watch to see how many people leave money in the honest box when those dots are there. People leave more money in the honest box when the dots that are reminiscent of a face looking at them are there. If you invert it—two dots on the bottom and one dot on top—it has no effect.

Now, would that always have an effect? I don’t know. If you have loud music playing and many people are in the room, that might not have any effect because people don’t notice it or it has a minimal effect. If that failed to replicate, it wouldn’t be surprising and it wouldn’t shake your faith in the idea that minimal cues, which ought to have no part in determining your behavior, do have an impact.

The single thing I’ve done that has gotten the most notice was my work with Tim Wilson showing that we have very little access to the cognitive processes that go on that produce our behavior. We are constantly being influenced by things that we don’t recognize have had an influence, and which are sometimes embarrassing to know. That isn’t why we’re unaware of them. We’re unaware of them because we don’t have access to our cognitive process. We claim that we do. You ask me why I do something, I’ll give you an answer, although you’ll probably believe it more than I will because I’m so aware of the extent to which we’re unaware of what goes on.

I’ll give an example of the experiment we did. We have two experiments, and in the first experiment—a learning experiment—people memorize word pairs. There might be a pair like "ocean-moon" for some subjects. Then we’re through with that experiment: "Thank you very much." The other fellow has this other experiment—on free association, where you are to give an example of something in a category we give you. So, one of the categories we give is, "Name a laundry detergent." People are much more likely to mention Tide, if they’ve seen that ocean-moon pair. You ask them, "Why did you come up with that? Why did you say Tide?" "Well, that’s what my mother uses," or "I like the Tide box." And you say, "Do you remember learning that word pair—ocean-moon?" "Oh, yeah." "Do you suppose that could’ve had an influence?" "No. I don’t think so."

There’s nothing hothouse or unusual about these studies. In fact, that’s life. You give people an orange pen to answer a consumer survey about what their preferences are, and they’ll circle more of the orange products. Would you obtain that in all circumstances? I don’t know whether you would or not.

I'll give you an example of two of the things that I’ve done that have gotten some notice and that were an attempt to bottle a personal experience. When I was in college, I had bad insomnia. I just couldn’t get to sleep. What would happen is that I’d lie in bed and I’d start worrying about my relationship with someone, or about the exam, and then I’d get more and more worked up about it. Eventually I’m hot and throwing off all the covers, tossing and turning.

Very early in my career I wondered if you couldn’t break this escalating pattern of worrying about something. The arousal you experience is evidence of how worked up you are about this thing. If we gave insomniacs a pill to take at bedtime and we said, "We’re interested in people’s dream content and we want to see how it’ll be affected by this pill we’re going to have you take, which will cause your heart to pound a little more irregularly, a little faster, your breathing to become more shallow and more rapid. You might find yourself getting warm, sweaty palms, and so on." (In other words, the physiological symptoms of arousal.) People who take a pill and are given those instructions get to sleep more quickly. It turns out that you can interrupt the vicious cycle by getting people to attribute the arousal to something non-emotional. That’s just a personal experience, understanding what’s going on with me.

Incidentally, this study turns out to be quite interesting in the context of non-replicability because this study was done and it was instantly recognized: "Gee, that’s interesting and it’s potentially important. It has therapeutic implications for how placebos should be treated," and so on. In fact, federal regulations, after this study was done, changed how you have to describe symptom effects. But nobody could replicate it. This astonished me because—this was back when I used to comb over data—there was no question in my mind that what I thought was going on was going on, that the effect was real. But there it is.

After several years somebody said, "I know why your insomnia experiment doesn’t replicate." And I said, "Why is that?" He had looked at people’s need for cognition. He said, "Some people like to think. Insomniacs who have high need for cognition show your effect, and those who don’t have high need for cognition don’t show the effect." The initial study had been done at Yale. All of the subsequent studies were done at places where you’d have people that you wouldn’t expect to be quite so interested in thinking.

Another example: I didn’t get a terribly good K-12 education. I grew up in El Paso, Texas. I’ve always regarded this as a strength for me because I didn’t have to do a lot of homework, so I would read books when I got home. I would sit in class and fantasize about the books I was going to read. In college I took an American literature course. I thought, "This is great. I’ll get to get credit for stuff that I would do anyway." After a while I’m reading these books—early American literature: Moby Dick, oh my god. As Dorothy Parker said, "More than you really want to know about whales." I’m not enjoying myself; it’s work. I said, "This is crazy. Why am I not enjoying myself?" It’s because I have defined what I am doing as work. If I was defining it as recreation it would’ve been much more fun.

The question then becomes, can you get people to do something in a mood where they’re thinking that it’s essentially work—it’s something they’re doing in order to get something else, versus something that they’re doing as an expressive thing? They’re doing it because it’s fun.

With Mark Lepper, we looked at nursery school kids. We put out what, at the time, was a special kind of pen—Magic Markers—that the kids had not seen before. We put that out on a table one day and we watched how much each kid played with the Magic Markers. Then, a week or so after that, a nice man comes up to each of the kids and says, "Do you remember the Magic Markers that were here a week or so ago? I would like to see what kinds of pictures kids would draw with these, and if you would be willing to draw some pictures for me with these Magic Markers, I could give you this Good Player Award. See, it has a place for your name, and a gold seal, and a blue ribbon. Would you like to have a chance to get that?" All of them do, of course. Or we don’t ask them to draw or offer them anything.

We come back a couple of weeks after that. The Magic Markers are out again. They haven’t been there during the two weeks. We’d look at how much kids draw with the Magic Markers. The ones who have contracted to draw with the Magic Markers, who’ve made a bargain that they can get this other thing if they play with the Magic Markers, play half as much as the kids who are not offered this contract. It isn’t just something about the certificate, because some kids we didn’t make a contract with, we’d just say, "Oh, thanks, those are great. I want to give you this award. See, there’s a place for your name," and so on. Those kids were just like the control kids. It wasn't that there’s something bad about getting the award; it’s that something’s bad about contracting, about framing the thing as work.

Interestingly, about three or four other people had the idea at the same time. I would be very interested to know what their intuitions were that made them come up with this idea. The basic idea is you can undermine intrinsic motivation with reward. And this has become a staple in education. Educators understand this, a lot of them—that you probably want to get behavior without resorting to reward if you can. There’s a lot of literature on all this.

Now, there are exceptions to it. If you, for example, give your reward to people who wouldn’t have engaged in an activity to begin with, though it’s something that would have intrinsic merit if they tried it—you can trigger a rewarding process in getting people to do something. They sort of like it; you don’t have to keep up the rewards. At any rate, there’s lots of additional information, but the basic idea has stood up.

By the way, as an example of these experiments, you look at that and everybody says, "Gee, that’s interesting. And if it’s true, it’s important." I don’t know of any failures to replicate that study. There are by now dozens of studies showing you can turn play into work. The study that I was talking about earlier about insomnia—that turns out to be a highly unstable effect. It depends on a particular type of person to get it. I couldn’t have known that and it turns out to be the case.

The general point that people can misattribute their physiological arousal and become more or less emotional as a result of that misattribution, that’s been shown scores of times in dozens of different contexts. My favorite is a study having guys answer a survey with a very attractive female interviewer. They either do that standing on a swaying suspension bridge over a thousand-foot deep gorge or they do it on terra firma. The dependent measure is, do they try to date this woman. If they answered the questionnaire standing on the swaying bridge, they find this woman extremely attractive and they’re much more likely to want to try to date her. The arousal that’s produced by this terror gets misattributed to the woman.

Apropos of money and extrinsic reward, I was at a World Economic Forum meeting and I was on a panel of people who were to think of ways to make people behave in their own interests and in society’s interests. There were economists, psychologists, political scientists, and physicians, and so on. One word kept coming up over and over again: incentivize. You incentivize, and usually it was money. I hear the word incentivize, I say, "If imagination fails, incentivize."

There are so many more ways of getting people to do what’s in their own interests and society’s interests. Absolutely the most powerful way that we have is to create social influence situations where you see what it is that other people are doing and that’s what you do. I took up tennis decades ago and it turned out that most of my friends had taken up tennis. I dropped it a few years later and it turned out that the tennis courts were empty. I took up cross-country skiing, and how about that, these other people do it. Then we lost interest in it, and find out our friends don’t do that anymore.

How about minivans and fondue parties? You do things because other people do them. And one very practical important consequence of this was worked out by Debbie Prentice and her colleague at Princeton, [Dale Miller]. Princeton has a reputation of being a heavy drinking school. Ivy League schools, like other schools, I guess Midwestern state universities too, some are drinking schools, some are not drinking schools, but at Princeton it was regarded as a serious problem.

Prentice and Miller had the idea to find out how much drinking goes on. They had the strong intuition that less drinking goes on than people think goes on, because on Monday a kid comes in and says, "I was stoned all week," when in actuality he was studying all Sunday for the exam. In a setting where people are drinking a lot, you get prestige for drinking a lot. If you get good grades despite the fact that you’re drinking a lot, then that makes you look smarter.

They found out how much people are actually drinking, and then they fed this information back to students and said, "This is how much drinking goes on." Drinking plummeted down to something closer to the level of what was actually going on.

Here's something that saved hundreds of millions of dollars and millions of tons of carbon dioxide being poured into the atmosphere in California by a social psychologist team led by Bob Cialdini. He hangs tags on people’s doors if they’re using more electricity than their neighbors saying, "You’re using more electricity than your neighbors," and that sends electricity usage down. However, you shouldn’t hang a tag on their door saying, "You’re using less electricity than your neighbors" because then people start using more electricity—unless you put a smiley face on the bottom. You’re using less electricity than your neighbor’s and a smiley face ... oh, that’s a good thing, I’ll keep it up. more

No comments:

Post a Comment