Showing posts with label Methods. Show all posts
Showing posts with label Methods. Show all posts

Whose Bad Guess is More Bad? Difficult Comparisons

October 29, 2014
Jay Livingston

How to compare percentages that are very different? 

A recent Guardian/Ipsos poll asked people in fourteen wealthy nations to estimate certain demographics. What percent of the population of your country are immigrants? Muslim? Christian?

People overestimated the number of immigrants and Muslims, and underestimated the number of Christians. But the size of the error varied.  Here is the chart on immigration that the Guardian published (here).


Italy, the US, Belgium, and France are way off. The average guess was 18-23 percentage points higher than the true percentage.  People in Japan, South Korea, Sweden, and Australia were off by only 7-8 percentage points.

But is that a fair comparison? The underlying question is this: which country is better at estimating these demographics? Japan and South Korea have only 2% immigrants. People estimated that it was 10%, a difference of eight percentage points. But looked at another way, their estimate was five times the actual number. The US estimate was only 2½ times higher than the true number.

The Guardian ranks Hungary, Poland, and Canada together since they all have errors of 14 points. But I would say that Canada’s 35% vs. 21% is a better estimate than Hungary’s 16% vs. 2%.  Yet I do not know a statistic or statistical technique that factors in this difference and allows us to compare countries with very few immigrants and those with far more immigrants.* 

My brother suggested that the Guardian’s readers could get a better picture of the differences if the chart ordered the countries by the immigrant percentage rather than by the percentage-point gap.


This makes clearer that the 7-point overestimate in Sweden and Australia is a much different sort of error than the 8-point overestimate in South Korea and Japan. But I’m still uncertain as to the best way to make these comparisons.


-------------------------------
* Saying that I know of no such statistic is not saying much. Perhaps others who are more familiar with statistics will know how to solve this problem.

Naming Variables

July 21, 2014
Posted by Jay Livingston

Variable labels – not the sort of problem that should excite much debate. Still, it’s important to identify your variables as what they really are. If I’m comparing, say, New Yorkers with Clevelanders, should I call my independent variable “Sophistication” (Gothamites, as we all know, are more sophisticated)? Or should it be “City” (or “City of residence”)? “Sophistication” would be sexier, “City” would  more accurate.

Dan Ariely does experiments about cheating.  In a recent experiment, he compared East Germans and West Germans and found that East Germans cheated more. 

we found evidence that East Germans who were exposed to socialism cheat more than West Germans who were exposed to capitalism.

Yes, East Germany was a socialist state. But it was also dominated by another nation (the USSR, which appropriated much of East Germany’s wealth) and had a totalitarian government that ruled by fear and mistrust.  For Ariely to write up his results and call his independent variable “Socialism/Captialism,” he must either ignore all those other aspects of East Germany or else assume that they are inherent in socialism.*

The title of the paper is worth noting: “The (True) Legacy of Two Really Existing Economic Systems.”  You can find it here.)

The paper has been well received among mainstream conservatives (e.g., The Economist), who, rather than looking carefully at the variables, are glad to conflate socialism with totalitarian evils.

Mark Kleiman at the Reality Based Community makes an analogy with Chile under socialist Allende and capitalist Pinochet.

Imagine that the results had come out the other way: say, showing that Chileans became less honest while Pinochet was having his minions gouge out their opponents’ eyeballs and Milton Friedman was gushing about the “miracle of Chile”? How do you think the paper would read, and what do you think the Economist, Marginal Revolution, and AEI would have had to say about its methods?

--------------------
* A couple of commas might have made it clearer that other East-West differences might have been at work. Ariely should have written, “we found evidence that East Germans, who were exposed to socialism, cheat more than West Germans, who were exposed to capitalism.”

Replication and Bullshit

July 9, 2014
Posted by Jay Livingston

A bet is tax on bullshit, says Marginal Revolution’s Alex Tabarrok (here).  So is replication.

Here’s one of my favorite examples of both – the cold-open scene from “The Hustler” (1961). Charlie is proposing replication. Without it, he considers the effect to be random variation.



It’s a great three minutes of film, but to spare you the time, here’s the relevant exchange.

CHARLIE
    You ought to take up crap shooting. Talk about luck!

         EDDIE
    Luck! Whaddya mean, luck?

         CHARLIE
    You know what I mean. You couldn't make that shot again in a million years.

       EDDIE
    I couldn’t, huh? Okay. Go ahead. Set ’em up the way they were before.

         CHARLIE
    Why?

         EDDIE
    Go ahead. Set ’em up the way they were before. Bet ya twenty bucks. Make that shot just the way I made it before.

         CHARLIE
    Nobody can make that shot and you know it. Not even a lucky lush.


After some by-play and betting and a deliberate miss, Eddie (aka Fast Eddie) replicates the effect, and we segue to the opening credits* confident that the results are indeed not random variation but a true indicator of Eddie’s skill.

But now Jason Mitchell, a psychologist at Harvard, has published a long throw-down against replication. (The essay is here.) Psychologists shouldn’t try to replicate others’ experiments, he says. And if they do replicate and find no effect, the results shouldn’t be published.  Experiments are delicate mechanisms, and you have to do everything just right. The failure to replicate results means only that someone messed up.

Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way.  Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.


L. J. Zigerell, in a comment at Scatterplot thinks that Mitchell may have gotten it switched around. Zigerell begins by quoting Mitchell,

“When an experiment succeeds, we can celebrate that the phenomenon survived these all-too-frequent shortcomings.”

But, actually, when an experiment succeeds, we can only wallow in uncertainty about whether a phenomenon exists, or whether a phenomenon appears to exist only because a researcher invented the data, because the research report revealed a non-representative selection of results, because the research design biased results away from the null, or because the researcher performed the experiment in a context in which the effect size for some reason appeared much larger than the true effect size.

It would probably be more accurate to say that replication is not so much a tax on bullshit as a tax on those other factors Zigerell mentions. But he left out one other possibility: that the experimenter hadn’t taken all the relevant variables into account.  The best-known of these unincluded variables is the experimenter himself or herself, even in this post-Rosenthal world. But Zigerell’s comment reminded me of my own experience in an experimental psych lab. A full description is here, but in brief, here’s what happened. The experimenters claimed that a monkey watching the face of another monkey on a small black-and-white TV monitor could read the other monkey’s facial expressions.  Their publications made no mention of something that should have been clear to anyone in the lab: that the monkey was responding to the shrieks and pounding of the other monkey – auditory signals that could be clearly heard even though the monkeys were in different rooms.

Imagine another researcher trying to replicate the experiment. She puts the monkeys in rooms where they cannot hear each other, and what they have is a failure to communicate. Should a journal publish her results? Should she have even tried to replicate in the first place?  In response, here are Mitchell’s general principles:


    •    failed replications do not provide meaningful information if they closely follow original methodology;
    •     Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.
    •    The field of social psychology can be improved, but not by the publication of negative findings.
    •    authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues.


Mitchell makes research sound like a zero-sum game, with “mean-spirited” replicators out to win some easy money from a “a lucky lush.” But often, the attempt to replicate is not motivated by skepticism and envy. Just the opposite. You hear about some finding, and you want to see where the underlying idea might lead.** So as a first step, to see if you’ve got it right, you try to imitate the original research. And if you fail to get similar results, you usually question your own methods.

My guess is that the arrogance Mitchell attributes to the replicators is more common among those who have gotten positive findings.  How often do they reflect on their experiments and wonder if it might have been luck or some other element not in their model?

----
* Those credits can be seen here – with the correct aspect ratio and a saxophone on the soundtrack that has to be Phil Woods. 

** (Update, July 10) ** DrugMonkey, a bio-medical research scientist says something similar:   
Trying to replicate another paper's effects is a compliment! Failing to do so is not an attack on the authors’ “integrity.” It is how science advances.  

Tide and Time

June 4, 2014
Posted by Jay Livingston

Survey questions, even those that seem simple and straightforward, can be tricky and yield incorrect answers.  Social desirability can skew the answers to questions about what you would do – “Would you vote for a woman for president. . . .?” and even factual questions about what you did do.  “Don’t ask, ‘How many books did you read last year?’‘ said the professor in my undergraduate methods course. “Ask ‘Did you read a book last week?’” There’s no shame in having been too busy to read a book in a seven-day period. Besides, people’s recall will be more accurate.  Or will it? Is even a week’s time enough to distort memory?

Leif Nelson (Berkeley, Business School) asked shoppers, “Did you buy laundry detergent the last time you went to the store?” Forty-two percent said yes.



Nelson doesn’t question the 42% figure. He’s interested in something else:  the “false consensus effect” – the tendency to think that others are more like us than they really are.

So he asks, “What percentage of shoppers do you think will buy laundry detergent?” and he also asks “Did you buy laundry detergent.” Sure enough, those who said they bought detergent give higher estimates of detergent buying by others. (Nelson’s blog post, with other interesting findings, is here.)

But did 42% of those shoppers really buy detergent last time they were in the store? Andrew Gelman is “stunned” and skeptical. So am I.

The average family does 7-8 washes a week. Let’s round that up to 10.  They typically do serious shopping once a week with a few other quick express-lane trips during the week.  This 50 oz. jug of Tide will do 32 loads – three week’s of washing.



That means only 33% of customers should have said yes.  And that 33% is a very high estimate since most families buy in bulk, especially with items like detergent. Tide also comes in 100-oz. and 150-oz. jugs.

If you prefer powder, how about this 10-lb. box of Cheer? It’s good for 120 loads. 

A family should need to buy this one in only one out of 12 trips. Even at double the average washing, that’s six weeks of detergent. The true proportion of shoppers buying detergent should be well below 20%.

Why then do people think they buy detergent so much more frequently?  I’m puzzled.  Maybe if washing clothes is part of the daily routine, something you’re always doing, buying detergent seems like part of the weekly shopping trip. Still, if we can’t rely on people’s answers about whether they bought detergent, what does that mean for other seemingly innocuous survey questions?

Sell It! – American (Psychology) Hustle

May 23, 2014
Posted by Jay Livingston

The Rangers crushed the Canadiens convincingly in game one: 7-2. The question was whether that result could be replicated . . . three more times.

Replication is hard (as the Rangers and their fans discovered in overtime at the Garden last night). That’s true in social science too. The difference is that the results of the Rangers’ failure to replicate were published.

Social psychologists are now paying more attention to the replication question. In the Reproducibility Project, Brian Nosek and others have set about trying to replicate most of the studies published in three top journals since 2008.  The first round of results was encouraging – of thirteen attempts, ten were consistent with the original findings. In one case, an “anchoring” study by Daniel Kahneman, the effect was stronger than in the original.

What failed to replicate? Mostly, experiments involving “priming,” where subliminal cues affect people’s ideas or behavior. In the best known and now most controversial of these, participants were primed by words suggesting old age (wrinkles, bingo, alone, Florida). They were then surreptitiously timed as they walked down the hall. In the original study by John Bargh (the priming primus inter pares), participants who were primed walked more slowly than did the controls.*

Many people have tried to replicate this study, and the results are mixed. One problem might be a “Rosenthal” effect, where the experimenters unintentionally and unknowingly influence the participants’ behavior so that it conforms with their expectations. Double-blind experiments, where the experimenters don’t know which participants have been primed, do not produce significant differences. (More here.)

I had a different explanation:  some guys can prime; some can’t. 

Maybe John Bargh and his assistants are really good at priming. Somehow, when they give participants those words mixed in among others, the subjects get a strong but still subliminal mental image of wrinkled retirees in Miami. But other psychologists at other labs haven’t got the same touch. Unfortunately, the researchers did not use an independent measure of how effective the priming was, so we can’t know.

I was delighted to see that Daniel Kahneman (quoted here ) had the same idea.

The conduct of subtle experiments has much in common with the direction of a theatre performance . . . you must tweak the situation just so, to make the manipulation strong enough to work, but not salient enough to attract even a little attention . . . .Bargh has a knack that not all of us have.

Many social psychology experiments involve a manipulation that the participant must be unaware of. If the person catches on to the priming (“Hey, all these sentences have words with a geezer theme,”), it blows the con. Some experiments require more blatant deceptions (think Milgram), and not all psychologists are good deceivers.

What reminded me of this was Eliot Aronson’s memoir Not by Chance Alone. Aronson is one of the godfathers of social psychology experiments, and one of his most famous is the one-dollar-twenty-dollar lie, more widely known as Aronson and Carlsmith, 1963.  Carlsmith was J. Merrill Carlsmith.  The name seems like something from central casting, and so did the man –  a polite  WASP who prepped at Andover, etc. 

In the experiment, the subject was given a boring task to do – taking spools out of a rack and then putting them back, again and again, while Carlsmith as experimenter stood there with a stopwatch pretending to time him.  The next step was to convince the subject to help the experimenter.

[Merrill] would explain that he was testing the hypothesis that people work faster if they are told in advance that the task is incredibly interesting than if they are told nothing and informed, “You were in the control condition. That is why you were told nothing.”

At this point Merrill would say that the guy who was supposed to give the ecstatic description to the next subject had just phoned in to say he couldn’t make it. Merrill would beg the “control” subject to do him a favor and play the role, offering him a dollar (or twenty dollars) to do it. Once the subject agreed, Merrill was to give him the money and a sheet listing the main things to say praising the experiment and leave him alone for a few minutes to prepare.

But Carlsmith could not do a credible job. Subjects immediately became suspicious.

It was crystal clear why the subjects weren’t buying it: He wasn’t selling it. Leon [Festinger] said to me, “Train him.”

Sell it.  If you’ve seen “American Hustle,” you might remember the scene where Irving Rosenfeld (Christian Bale) is trying to show the FBI agent disguised as an Arab prince how to give a gift to the politician they are setting up. (The relevant part starts at 0:12 and ends at about 0:38)



Here is the script:


Aronson had to do something similar, and he had the qualifications. As a teenager, he had worked at a Fascination booth on the boardwalk in Revere, Massachusetts, reeling off a spiel to draw strollers in to try their luck.

Walk right in, sit in, get a seat, get a ball. Play poker for a nickel. . . You get five rubber balls. You roll them nice and easy . . . Any three of a kind or better poker hand, and you are a winner. So walk in, sit in, play poker for a nickel. Five cents. Hey! There's three jacks on table number 27. Payoff that lucky winner!

Twenty years later, Aronson still had the knack, and he could impart it to others. Like Kahneman, he thinks of the experiment as theater.

I gave Merrill a crash course in acting. “You don’t simply say that the assistant hasn’t shown up,” I said. “You fidget, you sweat, you pace up and down, you wring your hands, you convey to the subject that you are in real trouble here. And then, you act as if you just now got an idea. You look at the subject, and you brighten up. ‘You! You can do this for me. I can even pay you.’”

The deception worked, and the experiment worked.  When asked to say how interesting the task was, the $1 subjects give it higher ratings than did the $20 subjects.  Less pay for lying, more attitude shift. The experiment is now part of the cognitive dissonance canon. Surely, others have tried to replicate it.  I just don’t know what the results have been.

--------------------

* An earlier post on Bargh and replication is here
.

Know Your Sample

April 22, 2014
Posted by Jay Livingston


Tim Huelskamp is a Congressman representing the Kansas first district. He’s a conservative Republican, and a pugnacious one (or is that a redundancy). Civility, at least in his tweets, is not his long suit. He refers to “King Obama” and invariably refers to the Affordable Care Act as “ObamaScare.” Pretty clever, huh?

He’s also not a very careful reader.  Either that or he does not understand the first thing about sampling. Tonight he tweeted.

(Click on a graphic for a larger view.)

Since polls also show that Americans support gay marriage, I clicked on the link.  The report is brief in the extreme. It gives data on only two questions and has this introduction.


The outrage might come from liberals. More likely it will come from people who think that members of the US Congress ought to be able to read.

Or maybe in Huelskamp’s view, only Republicans count as Americans.

Wonks Nix Pic Survey

February 18, 2014
Posted by Jay Livingston

“How could we get evidence for this?” I often ask students. And the answer, almost always is, “Do a survey.” The word survey has magical power; anything designated by that name wears a cloak of infallibility.

“Survey just means asking a bunch of people a bunch of questions,” I’ll say. “Whether it has any value depends on how good the bunch of people is and how good the questions are.”  My hope is that a few examples of bad sampling and bad questions will demystify.

For example, Variety



Here’s the lede:
Despite its Biblical inspiration, Paramount’s upcoming “Noah” may face some rough seas with religious audiences, according to a new survey by Faith Driven Consumers.
The data to confirm that idea:
The religious organization found in a survey that 98% of its supporters were not “satisfied” with Hollywood’s take on religious stories such as “Noah,” which focuses on Biblical figure Noah.
The sample:
Faith Driven Consumers surveyed its supporters over several days and based the results on a collected 5,000+ responses.
And (I’m saving the best till last) here’s the crucial survey question:
As a Faith Driven Consumer, are you satisfied with a Biblically themed movie – designed to appeal to you – which replaces the Bible’s core message with one created by Hollywood?
As if the part about “replacing the Bibles core message” werent enough, the item reminds the respondent of her or his identity as a Faith Driven Consumer. It does make you wonder about that 2% who either were fine with the Hollywood* message or didn’t know. 

You can’t really fault Faith Driven Consumer too much for this shoddy “research.” They’re not in business to find the sociological facts. What’s appalling is that Variety accepts it at face value and without comment.

----------------------
* The director of “Noah” is Daniel Aronofsky; the script is credited to him and Ari Handel.  For the Faith Driven Consumer, “Hollywood” may carry connotations in addition to that of industry and location – perhaps something similar to “New York sense of humor” in this clip  from “The West Wing” (the whole six minutes is worth watching, but you’ll get the idea if you push the pointer to 2:20 or so and watch for the next 45 seconds). Or look at this L.A. Times column by Joel Stein.

(HT: @BrendanNyhan retweeted by Gabriel Rossman)

What Never? No, Never.

January 31, 2014
Posted by Jay Livingston

A survey question is only as good as its choices. Sometimes an important choice has been left off the menu.

I was Gallup polled once, long ago. I’ve always felt that they didn’t get my real opinion.
“What’d they ask?” said my brother when I mentioned it to him.
“You know, they asked whether I approved of the way the President was doing his job.”  Nixon - this was in 1969.
“What’d you say?”
“I said I disapproved of his entire existential being.”

I was exaggerating my opinion, and I didn’t actually say that to the pollster.  But even if I had, my opinion would have been coded as “disapprove.” 

For many years the American National Election Study (ANES),  has asked
How much of the time do you think you can trust the government in Washington to do what is right – just about always, most of the time or only some of the time?
The trouble with these choices at that they exclude the truly disaffected. The worst you can say about the federal government is that it can be trusted “only some of the time.”  A few ornery souls say they don’t trust the federal at all. But because that view is a write-in candidate, it usually gets only one or two percent of the vote. 

This year the ANES included “Never” in the options read to respondents.  Putting “No-way, no-how” right there on the ballot makes a big difference. And as you’d expect, there were party differences:


Over half of Republicans say that the federal government can NEVER be trusted.

The graph appears in this Monkey Cage post by Marc Hetherington and Thomas Rudolph. Of course, some of those “never” Republicans don’t really mean “never ever.”  If a Republican becomes president, they’ll become more trusting, and the “never-trust” Democrat tide will rise.  Here’s the Hetherington-Rudolph graph tracking changes in the percent of people who do trust Washington during different administrations.


This one seems to show three things:
  1. Trust took a dive in the 1960s and 70s and never really recovered.
  2. Republican trust is much more volatile, with greater fluctuations depending on which party is in the White House.
  3. Republicans really, really hate President Obama.

Get a Spouse (sha-na-na-na. . . )

January 11, 2014
Posted by Jay Livingston

A bumper sticker I used to occasionally see said, “I fight poverty. I work.”

In this fiftieth anniversary of the War on Poverty, we should remember the difference between individual solutions to individual problems and societal or governmental solutions to social problems.  Yes, you’re less likely to be poor if you have a job. But exhorting the unemployed to go out and get a job is unlikely to have much effect on overall rates of poverty. 

The same can be said of marriage. In a recent speech, Sen. Marco Rubio offered the conservative approach to poverty.  The Rubio bumper sticker would say, “I fight poverty. I have a spouse.”  Here’s what he said:
 the greatest tool to lift people, to lift children and families from poverty, is one that decreases the probability of child poverty by 82 percent. But it isn't a government program. It's called marriage.
His evidence was drawn from a Heritage Foundation paper by Robert Rector.  Rector used Census data showing that poverty rates among single-parent families were much higher than among two-parent families – 37.1% vs. 6.8%.  “Being raised in a married family reduced child’s probability of living in poverty by about 82 percent.”

As Philip Cohen (here) pointed out, the same logic applies even more so to employment.
The median weekly earnings of full-time, year-round workers is $771 per week, which is $40,000 per year more than people with no jobs earn.
Philip apparently thought that this analogy would make the fallacy of the Rubio-Rector claim obvious, for he didn’t bother to spell it out. The point is that singling out marriage or employment as a cause ignores all the reasons why people don’t have jobs or spouses. It also implies that a job is a job and a spouse is a spouse, and that there is no difference between those of the middle-class and those of the poor.  (Philip should have spelled out the obvious. These logical problems did not bother PolitiFact , which rated Rubio’s claim as “mostly true.”)


According to Rubio, Rector, and PolitiFact, if all poor women with children got married, the child-poverty rate in the US would decrease by 82%.  Or at the individual level, if a poor single woman got married, her children would be nearly certain (93.2% likely) to be un-poor.

To illustrate the society-wide impact of marriage on poverty, Rubio-Rector look at the increase in out-of-wedlock births.  Here is a graph from Rector’s article.



The rate rises from about 7% in 1959 to 40-41% today.  If Rubio is right, rates of child poverty should have risen steadily right along with this increase (almost invariably  referred to as “the alarming” increase) in out-of-wedlock births.  The graph below shows poverty rates for families with children under 18.



Both show a large decrease in poverty in the first decade or so of the War on Poverty – between 1959 and 1974, the rate for all families was cut in half.  Since then the rate has remained between 9% and 12%.  The line for unmarried mothers shows something else that Rubio and Rector ignore: the effects of forces that individuals have no power over, things like the overall economy.  In the good years of 1990s, the chance that a single mother would be below the poverty line fell from nearly half (47%) to one-third.  Her marital status did not change, but her chances of being in poverty did.  The number of families in poverty fell from 6.7 million to 5.1 million – despite the increase in population and despite the increase in percentage of children born out of wedlock. There were more single mothers, but fewer of them were in poverty.

Addendum, January 12:  The title of this post refers to the classic oldie “Get a Job” (Silhouettes, 1957). The final lines of that song could, with only some slight editing, apply to Sen. Rubio and his colleagues:

In the Senate and the House
I hear the right-wing mouths,
Preachin’ and a cryin’
Tell me that I’m lyin’
’Bout a spouse
That I never could find.
(Sha-na-na-na, sha-na-na-na-na.)

How to Misread a Graph (It’s Not Easy, but The Heritage Foundation Finds a Way)

September 20, 2013
Posted by Jay Livingston

My post of a few days ago (here) showed The Heritage Foundation presenting a graph and deliberately drawing a conclusion that the graph clearly showed to be wrong.  Apparently, that’s something of a specialty at The Heritage Foundation.

Here’s their graphic purporting to show that preschool programs don’t work. (The original is here.)


The problem in the Oklahoma graph is the lag time between cause and effect.  For example, the baby boom began in 1947, but we would not look for its effects on healthcare and Social Security costs until much, much later.

Most people know this, but  Heritage seems to be lagging behind. “Fourth grade reading achievement scores in Oklahoma have actually declined.” True, they are lower now than in 1998, when universal preschool started. But is that the year should we use for a starting point for data on fourth grade reading scores?

Pre-school kids are three or four years old.  They don’t take the fourth-grade reading test until six or seven years later – in Oklahoma, that would be 2005 for the first cohort.  Amazingly (amazing to Heritage, I guess), that was the year reading scores began to increase, and despite a slight dip last year, they are still above that level.

As for the Georgia graph, anyone glancing at it (anyone except for people at The Heritage Foundation) would see this: reading scores in Georgia began increasing in 1995, two years after universal preschool began, and continued to rise when the first preschoolers reached fourth grade; scores have continued to rise faster than the national average.  Georgia was behind, now it’s ahead. Something good has been happening.

Heritage, however, manages not to see this and instead complains about how long it took Georgia to reach that point. (“Georgia’s program was in place for 13 years before scores caught up to the U.S. average.”)

A simple graph of scores is not really an adequate assessment of universal preschool. Those assessments, which include many other relevant variables,* have been done, and they generally conclude that the programs do work.  But that’s not the point.  The point is that Heritage is again misreading its own graph. So again I repeat, “Who you gonna believe, the Heritage Foundation or your lyin’ eyes?”

HT: Philip Cohen, who apparently thinks the Heritage deliberate obtuseness is so obvious as to be unworthy of elaboration.

-----------

* These include the usual demographics, especially to see if preschool effects are different for different groups. But there’s also the problem of post-preschool education. A state might have great preschools, but if it also has lousy primary schools, the benefits of preschool will be eroded away by the time the kids are in fourth grade.


Anecdotal Evidence – One More Time

June 14, 2013
Posted by Jay Livingston

Anecdotal evidence seems more convincing, I tell my students in Week One, but if you want to find out general truths, you need systematic evidence.  The New York Times today provides my example for next semester.

The Times had run an op-ed  last week about only children. The author, Lauren Sandler, referred to results from “hundreds of studies” showing that only children are generally no different from those with siblings on variables like “leadership, maturity, extroversion, social participation, popularity, generosity, cooperativeness, flexibility, emotional stability, contentment.” Nor were they more self-involved or lonelier.  And they score higher on measures of intelligence and achievement.   

Today, the Times printed a letter challenging these conclusions.  
 Another problem with these studies is that they put families in boxes: the only-child box, the divorced-parent box, the single-mother box — all of which I am in. They oversimplify family situations. I have seen the offspring of single divorced mothers grow up happy and successful, and I have seen children of two-parent families turn out disastrously.

Regarding the precocity of only children, my granddaughter at 2, like Ms. Sandler's daughter, could tell the difference between the crayon colors magenta and pink, and she is not an only child. So much for boxes.
Or as a student will usually ask, “But doesn’t it depend on the individual?”

Yes, I say.  But scientific generalizations do not apply 100% to everyone in that box.  Are men taller than women?  Are smokers less healthy than non-smokers?   Of course. Yes, there’s Maria Sharapova and the WNBA, and there are no doubt thousand of pack-a-day octogenarians.  Does that mean we should throw categories (i.e., boxes)  like Sex and Smoking in the trash?

As the letter writer says, categories simplify. They overlook differences. But categories are inevitable. Pineapple is a category. We know that not all pineapples are alike, and yet we talk about pineapples.  And men.  And smokers. And divorced mothers and only children.

I’m not surprised that my students – 18-year old freshmen or transfers from the community colleges – need this brief reminder. But the New York Times?

In any case, the concern over the problems of only children seems to be fading, though I'm not sure how to interpret that.  The Google n-grams graph of the phrase in books looks like this: 



The first decline in the phrase only children runs parallel to the baby boom (though it starts a few years earlier) and the burgeoning of multi-child families.  But the second decline comes in a period when multi-child families are decreasing.  Perhaps there is less concern because single-child families have become frequent rather than freakish. 

Wanted – Bad Research

April 22, 2013
Posted by Jay Livingston

I’m not a research director.  But if I were, I hope I wouldn’t write questions that are obviously designed to bias the results.*  And if I did ask such questions, I wouldn’t boast about it in the newspaper, especially if my stacking of the deck got barely a majority to give the answer I wanted. 

But then, I’m not Michael Saltsman, research director for the Employment Policies Institute, whose letter to the Record (formerly known as The Bergen Record) was published today.
Regarding "Most favor minimum wage hike" (Page L-7, April 18):

The recent Rutgers-Eagleton poll finding that 76 percent of New Jerseyans support a minimum wage increase only proves that incomplete poll questions yield misleading results.

My organization commissioned ORC International to conduct a similar poll regarding an increase in the minimum wage. When respondents were informed of the unintended consequences of minimum wage hikes — particularly how such hikes make it more difficult for the least-skilled to find work— 70 percent support flipped to 56 percent opposition. [emphasis added]

This consequence isn't a hypothetical: Fully 85 percent of the most credible economic studies from the past two decades indicate a loss of job opportunities following a wage hike.

Michael Saltsman
Washington, D.C. , April 18
As for the facts on the effects of an increase in the minimum wage, Saltsman’s literature review is on a par with his questionnaire construction.  Apparently he missed John Schmitt’s CEPR article from two months ago (here).    The title pretty much sums it up:
Why Does the Minimum Wage Have No Discernible Effect on Employment?
Schmitt includes this graph of minimum-wage effects from a meta-analysis.


Hristos Doucouliagos and T. D. Stanley (2009) conducted a meta-study of 64 minimum-wage studies published between 1972 and 2007 measuring the impact of minimum wages on teenage employment in the United States. When they graphed every employment estimate contained in these studies (over 1,000 in total), weighing each estimate by its statistical precision, they found that the most precise estimates were heavily clustered at or near zero employment effects.
Schmitt offers several guesses as to why employers don’t cut jobs when the minimum wage rises – maybe they raise prices, or accept a lower profit margin, or reduce the wages of better-paid employees; or maybe the increased minimum wage brings more customers, and so on.**

But regardless of the findings on minimum wage, Saltsman’s letter carries a more important if depressing message.  We try to teach our students to design good research.  We tell them that good research skills might help them get jobs.  Yet here is an example of a research-director job that depends on designing bad surveys and doing bad research. 
                                                           
------------------------------
*In his methods course, my colleague Chris Donoghue uses a made-up abortion item for teaching items that introduce bias:
“Every year in the US, over a million babies are killed by abortion. Do you agree that laws should make it more difficult to get an abortion?”

** Brad Plumer at WaPo’s WonkBlog has more on this, including a fuller discussion of Schmitt’s paper (here).

What Would You Do?

December 27, 2012
Posted by Jay Livingston

When you ask a “what if” question, can you take people’s responses at face value?

A student sent me a link to a study that asked whether Americans or Turks were more likely to act on principles of universalism as opposed to particularism.

I had talked in class about universalism (apply general rules to everyone) and particularism (decide based on the needs, desires, abilities, etc. of the actual people in some real situation).  My five-cent definition was this: With particularism, if the rules don’t fit the people, too bad for the rules.  With universalism, if the rules don’t fit the people, too bad for the people. 

One of the examples I used to illustrate the difference was shopping.  For most items, we prefer universalism – a fixed price.  Everyone pays the amount marked on the price tag. You have only two options: buy it or leave it.  In Mediterranean cultures, buyers and sellers are much more likely to haggle, arriving at a price based on the unique utility curves and bargaining skills of the buyer and seller.  This winds up with different people paying different prices for the same item.

The researchers asked American and Turkish students about a “hypothetical situation”:
You are a professional journalist who writes a restaurant review column for a major newspaper. A close friend of yours has invested all her savings in her new restaurant. You have dined there and think the restaurant is not much good. Does your friend have some right to expect you to hedge your review or does your friend have no right to expect this at all?
I assumed that the study would find Americans to be more universalistic.  But I was wrong, at least according to this study.
Turkish American Total
Particularistic 8 (19%) 85 (65%) 93
Universalistic 34 (81%) 45 (35%) 79
Total 42 130 172


Four out of five Turkish students said they would write their review according to universalistic principles.  Two-thirds of the Americans said they’d give their friend a break even if that meant departing from the standards of restaurant reviewing.

I was surprised.  So was my Yasemin Besen-Cassino.  Not only is she Turkish (though very global cosmopolitan), but she sometimes teaches a section of our methods course.  She added, “I am not a fan of hypotheticals on surveys.”

And oh boy, is this hypothetical.

  • IF you were a reviewer for a major paper and
  • IF the restaurant were bad and
  • IF the owner were your friend and
  • IF she had invested all her money in the place
    what kind of review would you write?
The more hypothetical the situation, the more I question people’s ability to know what they would do.   “IF the election were held today, who would you vote for?” probably works.  The situation – voting – is a familiar one, and there’s not all that much difference between saying the name of a candidate to an interviewer and choosing that name on a ballot.   But how many of us have experience writing reviews of friends’ restaurants? 

Nearly all my students say that if they were in the Milgram experiment, they’d have no trouble telling the experimenter to take a hike.  And all those concealed-carrying NRA members are sure that when a mass murderer in a crowd started firing his AR-15, they would coolly identify the killer and bring him down.  But for novel and unusual situations, we’re not very good at predicting what we would do. 

When I present the Milgram set-up and ask, “What would you do?”  sometimes a student will say, “I don’t know.”  That’s the right answer.

Surveys — Questions and Answers

December 10, 2012
Posted by Jay Livingston

Neil Caren at Scatterplot  lifts up the rock that is the New Family Structure Study (NFSS) – the basis of Mark Regnerus’s controversial research on children of gay parents – and discovers some strange creatures wriggling about underneath: 

. . .   85 people reported living at least four months with their “mother’s girlfriend/partner.” However—and this is where it gets tricky—a different question (S8) asked, “Did you ever live with your mother while she was in a romantic relationship with another woman?” Eight people who reported in the calendar that they lived with their mother’s girlfriend answered no to this question.
So ten percent of the people who said they lived with the mother’s girlfriend also said on a different question that they did not live with the mother’s girlfriend.
                   
We all rely on surveys – pollsters, social scientists, market researchers, government agencies, businesses. We try to make our questions straightforward.  But the question we ask is not always the question people answer.  And people’s answers – about what they think and what they did – are influenced by external factors we might not have considered.  Especially if the survey is a one-off (unlike the GSS and other surveys with frequently asked questions),  we have to be cautious about taking the results at face value.

(Previous posts on this problem are here and here.)

Prediction Methodology - Not Too Swift

November 6, 2012
Posted by Jay Livingston

One of the first things I try to get students to understand is the difference between systematic evidence and anecdotal and impressionistic evidence.  Or no evidence, which usually takes the form of “We don’t need studies to know that . . . .” or “Common sense tells us . . . .”

So in one corner we have Nate Silver (known in some circles as Nate the Great at Five Three Eight), systematically weighing the data from polls and other sources.  He sees Obama as the likely winner.


And then there’s Peggy Noonan at The Wall Street Journal.
Is it possible this whole thing is playing out before our eyes and we’re not really noticing because we’re too busy looking at data on paper instead of what’s in front of us?
In front of her eyes is victory for Romney.  Here are some more excerpts that show the evidence she uses as the basis for her prediction.
Among the wisest words spoken this cycle were by John Dickerson of CBS News and Slate, who said, in a conversation the night before the last presidential debate, that he thought maybe the American people were quietly cooking something up, something we don’t know about.

I think they are and I think it’s this: a Romney win.

There is no denying the Republicans have the passion now, the enthusiasm.
it feels like a lot of Republicans have gone from anti-Obama to pro-Romney.
And there is Obama, out there seeming tired and wan, showing up through sheer self discipline.

All the vibrations are right. A person who is helping him who is not a longtime Romneyite told me, yesterday: “I joined because I was anti Obama—I’m a patriot, I’ll join up But now I am pro-Romney.”

And there’s the thing about the yard signs. In Florida a few weeks ago I saw Romney signs, not Obama ones. From Ohio I hear the same. From tony Northwest Washington, D.C., I hear the same.
I imagine going to the World Series.  The guy at the hot dog stand says he thinks the Tigers are about to make a move.  I see Detroit players’ faces, full of passion and enthusiasm; the Giants look tired and wan. The Tigers are getting hits.  They even had a home run.  Their pitchers are tall and strong.  And then there’s the thing about caps – all those Detroit caps with the old English D.  I see them everywhere.

It all points to a big win by the Tigers. Clearly, the Giants are toast.

And then some nerd – “a man of very small stature, a thin and effeminate man with a soft-sounding voice, a poster child for the New Castrati”* – taps me on the arm and points to the scoreboard which posts the number of runs that each team has actually scored and the number of games that they have won.

Yes, Romney could win.  But remember Damon Runyon’s riff on Ecclesiastes: “The race is not always to the swift, nor the battle to the strong, but that's how the smart money bets.”

And they’re betting Obama.  At In-trade, a $100 bet would bring Ms. Noonan $300, and somewhat more if she bet in the UK.  My own hunch is that betting a bundle on Romney right now is not too swift.

UPDATE:  Another Republican speechwriter-turned-columnist, Michael Gerson, is yapping at Nate Silver.  John Sides at The Monkey Cage offers an excellent critique of Gerson and a defense of data- based social science.  (It is kind of depressing – Gerson and Noonan and the rest are intelligent people, and yet they force you to defend the radical idea of using systematic evidence.  But then again, their party is the standard-bearer for people who think that global warming is a myth and that earth is only 7000 years old.)

------------------------
 * Yes, this is what someone at the right-wing examiner.com actually wrote about Nate Silver.  I am not making this up.

Surveys and Sequence

July 8, 2012
Posted by Jay Livingston

Push polls are an extreme example of the problems inherent in surveys, even surveys that are not apparently tendentious.  You ask a seemingly straightforward questions, but respondents may not be answering the question you think you asked.  That’s why I tend to distrust one-shot surveys with questions that have never been used before.  (Earlier posts on this are here and here).

Good surveys also vary the sequence of questions since Question #1 may set the framework a person then uses to think about Question #2. 

“Yes, Prime Minister” offers a useful example – exaggerated, but useful in the research methods course nevertheless.




HT: Keith Humphrey

When the Going Gets Tough – Lipstick and Evolution

June 28, 2012
Posted by Jay Livingston

L’Oreal did not lose sales during the current recession.  And psychologist Sarah Hill says that this increase is part of a more general trend – “the lipstick effect.”  In a recession, women cut back on other stuff, but not cosmetics.

What makes L’Oreal worth it, even when times get tough, according to Hill, is evolutionary psychology.  (Hill’s new JPSP article  is here. She also has a shorter, more general write-up at Scientific American.)  It’s all about “reproductive strategy” – how to get your genes strewn about as much as possible. 
Human ancestors regularly went through cycles of abundance and famine, each of which favors different reproductive strategies. While periods of abundance favor strategies associated with postponing reproduction in favor of one’s own development (e.g., by pursuing an education), periods of scarcity favor more immediate reproduction. The latter strategy is more successful during times of resource scarcity because it decreases the likelihood that one will perish before having the chance to reproduce.

Got it? In good times, our human ancestors would try to get an education.  In hard times, they would try to get laid. 

Hill elaborates on the special problems for women.
For women, periods of scarcity also decrease the availability of quality mates, as women’s mate preferences reliably prioritize resource access.
“Reliably prioritize resource access” is from the SciAm blogpost, presumably the venue that’s reader-friendly for the general public.  What the sentence means, I think, is this:  A recession reduces the number of guys with enough money to take care of a family.

Those well-off guys, I mean males, thanks to evolution, are “men who seek in mates qualities related to fertility, such as youth and physical attractiveness.”  So a girl has to go even further in dolling herself up in order to snag one of them. 

It all makes sense, but it ignores one important factor – the economic  inequality between men and women.  The evol-psych explanation takes as a given that women must rely on men for “resource access” (which I think is roughly what you and I call “money.”)  What if women knew that their chances of getting a decent job were as good as a man’s, or better?  Would hard times still send them to the cosmetics counter?

Hill did include a measure of resource access, and found that it was not significantly related to the lipstick-effect, at least not in the lab experiments.  Here was the set-up: Subjects read an article that was either about the recession (“Worst Economic Crisis Since ’30s With No End in Sight”) or about “current architecture.” Then they were asked which products they preferred.  Women who read about the recession were more likely to go for (in the words of evolutionary psychologist C. Berry) “tight dresses and lipstick.”*  The “resource access” measure did not significantly alter that effect.  Rich girls and poor girls alike switched their preference to L'Oreal.

As for the guys, reading about the recession did not affect them in this way.  Their desire for “attractiveness products” was unchanged.

I never know what to make of psychology experiments.  Their elaborate contrivance gives them enviable control over the variables, but it also raises questions about their link to the real world.  In Hill’s experiments, as is typical, the subjects were “unmarried female university students” – what we used to call “college girls” (plus, in one of the experiments, college boys).  It would be interesting to see if actual recessions lead to lipstick-buying across the socio-economic landscape.  Evol-psych would predict that the effect should be most visible in places where the recession hits hardest.

It’s also worth noting that L’Oreal might have been the exception this time around.  Sales in the industry as a whole suffered in the recession and did not reach pre-recession levels till 2010, and much of the increase came from bargain hunters.  (An industry report is here.) That contradicts Hill’s lab experiment results showing that “the lipstick effect applies specifically to products that enhance beauty, even when those products are more expensive.”   The larger increase in cosmetics sales came in 2011, especially for nail products (up 59%, go figure).

The experiment’s “priming” with newspaper stories is also a problem.  I’m puzzled about the use of that “current architecture” article as a control.  Why not an article that was upsetting but had nothing to do with economics – something like “How Hackers Easily Get Your Phone Messages”?  Maybe any disturbing article would have the same lipstick effect, even though cell phone privacy has nothing to do with a woman’s ability to pass along her genes.  As the t-shirt says, “When the going gets tough, the tough go shopping.” Maybe it doesn’t matter whether the tough-going is economic or something else.

Finally, I wonder about those guys.  If recessions make women but not men worry about their genes, asking college guys about face cream and tight polo shirts might not be the best way to operationalize the variable.  Why not ask about things that most guys think make them more attractive to women – probably consumer goods that signal cultural and economic capital?  Maybe college boys who read the recession article would shift their preference from video games to dress shirts and ties; or maybe the change would go the other way.  Whatever the outcome, I'm sure evol-psych would have an explanation. 

-----------------------
* I am not making this up: “The three attractiveness-enhancing products were (a) form-fitting jeans, (b) form-fitting black dress (women) / form-fitting polo shirt (men), and (c) lipstick (women) / men’s facial cream (men).”  And, as noted above, these were college girls, not all that much older than sweet little sixteen.

Free Samples

June 23, 2012
Posted by Jay Livingston

Google has nGrams for quick content analysis of words and phrases in “lots of books.”  Google also has Correlate which allows you to trace search strings across time and place and to discover correlations between search strings. 

Facebook too makes information on their users available, though their motive is not so selfless as Google’s.  The do it so that advertisers can narrow their target.  Planet Money had a story recently about a pizza joint in New Orleans that used FB’s data to select the target audience for its ads.
Their first idea was to target the friends of people who already liked Pizza Delicious on Facebook. But that wound up targeting 74 percent of people in New Orleans on Facebook — 224,000 people. They needed something narrower.

The Pizza Delicious guys really wanted to find people jonesing for real New York pizza. So they tried to target people who had other New York likes — the Jets, the Knicks, Notorious B.I.G. Making the New York connection cut the reach of the ad down to 15,000.

Seemed perfect. But 12 hours later, Michael called us. “It was all zeroes across the board,”  he said. Facebook doesn't make money till people click on the ad. If nobody clicks, Facebook turns the ad off. They'd struck out.

So they changed the target to New Orleans fans of Italian food: mozzarella, gnocchi, espresso. This time they were targeting 30,000 people.

Those ads went viral. They got twice the usual number of click-throughs, on average. The ad showed up more than 700,000 times. Basically, everyone in New Orleans on Facebook saw it. Twice.
To get the access to the data, you don’t really have to be an advertiser; you just have to play one on Facebook.  Neal Caren at UNC tells you how.  He used Facebook to compare rates of same-sex and hetero preferences across age groups and states.  His instructional post is here.

(HT: Philip Cohen)

Whose Kids Are All Right?

June 22, 2012
Posted by Jay Livingston

Miscellaneous thoughts on the Regnerus study.

1.    Oranges and apples.  This study is not about the effects of gay marriage.  Opponents of gay marriage trying to cram it into that cubbyhole apparently have not read the title: “How different are the adult children of parents who have same-sex relationships? Findings from the New Family Structures Study.” [emphasis added]

Who are these “parents who have same-sex relationships”?  They are not gay couples (there were only two of those in the sample, both female).  The image I get is the closeted homosexual trying to do the right thing, maybe even “cure” himself, by getting married.  The cure doesn’t work and he is now in an unhappy, unfulfilling marriage, but he stays because of the kids.  Eventually, he gives in to his desires, has a “same-sex relationship,” and maybe leaves his family.  

Is this scenario common in Regnerus’s sample?  I don’t know.  But to make gay parent vs. straight parent comparisons on the basis of the sample with only two gay couples is to compare these unhappily married oranges with Ozzie-and-Harriet apples.  As Regnerus’s defenders delicately put it, “This is not an ideal comparison.”

2.    Secondary deviance.  Edwin Lemert coined this term to refer to deviance that arises as a reaction to the social or legal stigma that comes with the primary deviance.   The crime is primary, the coverup is secondary.  The coverup occurs only because the original act is criminal.  The same applies to non-criminal forms of deviance and to social sanctions rather than legal ones.

Again, the Regnerus defense team: “This instability may well be an artifact of the social stigma and marginalization that often faced gay and lesbian couples during the time (extending back to the 1970s, in some cases) that many of these young adults came of age.” 

3.    Rights and Research.  As Ilana Yurkiewicz at Scientific American says, even if good, relevant research on the topic of gay marriage (which the Regnerus study is not) showed that kids from gay marriages do worse than kids from straight marriages, that’s no reason to deny people the right to marry.

Research has already found such differences between other categories of people – poor vs rich, for example.  Should we deny poor people the right to marry because their kids are less likely to do well in school or more likely to have run-ins with the law?  I would not be surprised if back in the mid-20th century, research would have shown (or perhaps did show) that the children of interracial marriages did not do as well on several variables as did Ozzie-and-Harriet or Cosby-show offspring.  Would that have been a valid reason to uphold laws banning interracial marriage?

4.    Etc.  Philip Cohen is much more qualified than I am to offer criticisms and comments on the study.  You should read his as yet unpublished op-ed.

They Work Hard for a Ton of Money

May 21, 2012
Posted by Jay Livingston

I’m not very good at looking at a scatterplot and estimating the correlation. 

This morning’s Wall Street Journal had a front-page story  about CEO pay.  Here’s the lede:
Chief executives increasingly are being paid based on their companies' financial results and share prices, according to a Wall Street Journal analysis.
The WSJ even had an outside source check their calculations and conclusions.
Pay was “highly correlated with performance,” says Steven Kaplan, a professor of finance at the University of Chicago's Booth School of Business who reviewed the Journal calculations.
Here’s the scatterplot showing the 300 largest companies:



(Click on the chart for a larger view.  Those wedge-shaped lines point to
very large photographs of individual CEOs, which I cropped out.)

I guess “highly correlated” is a term of art.  Unfortunately, the WSJ does not provide a regression line or correlation coefficient, but apparently the slope is +0.6.
On average, for every additional 1% a company returned to shareholders between 2009 and 2011, the CEO was paid 0.6% more last year, the analysis found. For every 1% decline in shareholder return, the CEO was paid 0.6% less.
I like that idea of considering the profitable CEOs separately from the CEOs whose firms lost money.  Here is the same scatterplot split down the middle. 




If you divide the Pay axis at $20 million, the relation becomes clear.  For every $20M+ CEO in a losing company, there are three in profitable companies. 

But here’s where my inability to look at the dots and estimate correlations messes me up.  To me, it looks as though among the losing firms, there’s no relation between CEO pay and how well the company did (i.e., how small its losses).  Same thing on the profit side, especially if you ignore the three $60M+ outliers.  (Timothy Cook of Apple, at $378M, lies out so far he’s not even on the chart.)  

I’m not sure to who to believe – the Wall Street Journal or my lyin’ eyes. 
The WSJ site has a chart listing the compensation of all 300 – from Apple down to Whole Foods, whose CEO didn’t even snag $1 million.

The story also heralds 2011 as showing huge improvement over the previous year in rationality, or at least the proportionality of pay to profits,
In 2010, there was no correlation; for every 1% decrease in shareholder return, the average CEO was paid 0.02% more.
Yes, you read that correctly.  The correlation was negative  – the smaller the profit (or larger the loss), the higher the CEO pay.