(Update: Andrew Gelman responded via email, and his response has been pasted at the bottom of this article.)
One of the more surprising things about the most-talked-about academic paper of 2015 is what it doesn’t say. When Proceedings of the National Academies of Science published a paper co-authored by the Princeton economists Anne Case and Angus Deaton late last month, the focus was, understandably, on their alarming headline finding: While the overall death rate for just about every other group of 45- to 54-year-olds in the wealthy developed world ticked downward between 1999 and 2013, for non-Hispanic American whites it actually increased. As the statistician Andrew Gelman has argued, the actual story may be a bit more complex than the paper indicated, with important differences in these trajectories before and after 2005, but no one is disputing the importance of the finding overall, given how widespread the drop in mortality was in other groups of middle-aged people.
But the husband-and-wife duo also wrote that they didn’t notice meaningful differences between the death-rate trajectories of men and women over that period, a noteworthy detail given the fact that the recession — seen as likely an important contributing factor to these dire trends — hit men harder than women, and the fact that men and women tend to die from different things at different rates more generally. It’s interesting, in other words, that the fates of middle-aged white men and women have been so tightly linked during this period.
On Tuesday, researchers from the Urban Institute published a post on the Health Affairs Blog claiming to refute this aspect of Case and Deaton’s findings, arguing that the economists glossed over the fact that women, in fact, were harder hit during this period (that same day, Gelman made a more nuanced case that since 2005, mortality has been going up for middle-aged white women, but down for men, and that before then it had been rising for both groups since 1999). Science of Us reached out to Case and Deaton for a response to the Health Affairs Blog post, and in an ensuing interview Case offered revealing new details about the gender question, the broader process of putting together this important research, and the trickiness of debating complicated scientific questions in a hyperkinetic media environment.
The co-authors of the Urban Institute post — Laudan Aron, Lisa Dubay, Elaine Waxman, and Steven Martin — write that they analyzed the same data Case and Deaton used and found that “the average increase in age-specific mortality rates for whites age 45-54 was more than three times higher for women than men. More specifically, between 1999 and 2013, age-specific mortality rates for US white women age 45-54 increased by 26.8 deaths per 100,000 population, while the corresponding increase for men was 7.7 deaths.”
Examining the graph Case and Deaton used in their paper to demonstrate the rising death rates of middle-aged white Americans, you can see how different an increase of 26.8 deaths would look as compared to an increase of 7.7:
In other words, if the Urban Institute researchers’ numbers are right, if Case and Deaton had broken out men and women separately, this difference would have popped out — and would have been a weird thing to have ignored.
Case said that she didn’t buy this argument. “We spent a year working on this paper, sweating out every number, sweating out over what we were doing, and then to see people blogging about it in real time — that’s not the way science really gets done,” she said. “And so it’s a little hard for us to respond to all of the blog posts that are coming out.” But the Urban Institute researchers’ claim is an important one, Case said, and worth responding to.
According to Case, there are two issues here: the Urban Institute’s estimates of the death-rate increases, and the question of gender differences in lung-cancer deaths. First, Urban is working with different numbers for the increase in mortality rates per 100,000 people between 1999 and 2013. Urban claims increases of 26.8 and 7.7 for women and men, respectively; Case said they saw increases of about 40 and 28 — higher numbers, but less of a gap. The discrepancy stems from the fact that Case and Deaton’s numbers aren’t age-adjusted, while Urban’s are. Age adjustment, as Gelman explained in his post, is a way to get around the fact that when you look at a single age group over time, its age composition changes — maybe by the end of the period in question, 54-year-olds represent a much higher percentage of the 45-to-54 group in question than they did at the beginning. Since 54-year-olds are also more likely to die, this shift in the age composition could give the false impression that the group’s mortality rate is going up for external reasons, when in reality it’s just getting older on average.
Case and Deaton have accepted Gelman’s critique on this front — they agree they should have applied some degree of age adjustment. That was the source of an initial debate, sparked by Gelman, over whether it was fairer to say the middle-aged white mortality rate was going up or had merely flattened (which would still be newsworthy given, again, the drop everywhere else). But Case said there’s flexibility to conducting age-adjustment. “We don’t want people to take the Urban Institute numbers as though they were God-given,” she said, “because there are a very large number of ways someone can age-adjust this cohort.” In a followup email she said it’s important to realize out that each method comes “with its own implicit assumptions, and that each answers a different question.” (In an emailed response forwarded on by an Urban spokeswoman, the authors defended their age-adjustment technique, writing: “We chose a simple straightforward method and then confirmed it using another approach to ensure the integrity of our results.”)
But according to Case, even her and Deaton’s own, non-age-adjusted numbers show a gap between women and men. So why didn’t they report this in their paper, instead of relegating the gender discussion mostly to a single parenthetical noting that “Patterns are similar for men and women when analyzed separately”? Case explained that she and Deaton did in fact closely look at potential gender differences. “When we were working on this paper, we did everything by gender, and then everything was so similar that we put them together — one, for space reasons, and two because audiences become fatigued if you show them too many numbers,” she said.
In looking closely at the data, Case and Deaton discovered that they viewed as a simple (relatively speaking) explanation for the 12-deaths-per-100,000 difference in the increase of mortality between men and women: smoking. Case explained that while Americans on the whole have started smoking a lot less in recent decades, women, on average, started smoking less significantly later. Researchers aren’t sure why, but the consequences are stark: Over the 1999–2013 period in question, men enjoyed a decline in lung-cancer deaths of almost 9 per 100,000, she said, while rates remained flat for women — and when you factor this in, the broader gender differences largely (but not entirely) disappear. And since in so many other areas Case and Deaton saw mortality trends that were similar between the two genders, they decided to collapse the categories together — it was a calculated decision. (Case and Deaton don’t explain this lung-cancer discrepancy in the paper itself, and don’t note the different rates for men and women.) (Gelman didn’t immediately respond to a request for comment, but I’ll update this post if he does.)
More broadly, the Urban Institute authors argue that “By lumping women and men together, the study also missed the important point that the increases in mortality are affecting women of reproductive and childrearing ages, a finding that has huge implications for children, families, and communities.” Turning to other data, they provide a graph showing how U.S. women have fared as compared to women in a clump of the other richest countries on Earth when it comes to the likelihood they’ll reach 50 years old. They didn’t include a comparison graph for men, so I asked them if they could generate one, and they sent me back this combination of the graph already in their post (on the right) and a corresponding one for men (on the left):
Case said these graphs don’t really affect her and Deaton’s analysis: “The probability of surviving to age 50 isn’t relevant here — that metric is dominated by what happens to infant mortality and has close to nothing to do with what we are reporting,” to which two of the Urban authors (Martin and Aron) responded that their general life-expectancy claim holds even when setting aside the issue of infant mortality.
But in a sense the argument is moot: For one thing, these graphs, as the authors note in the blog post, include all racial groups, whereas the significance of Case and Deaton’s findings have to do with their focus on the trajectory of a population that’s generally seen as advantaged as compared to racial minorities. Moreover, the graphs clearly show that both men and women are being left behind, life-expectancy-wise, as compared to the rest of the rich world, which reinforces the general theme being sounded by Case and Deaton. Setting aside one outlier on the men’s graph, in fact, men and women even hit the bottom ranking at approximately the same time — around 1995. (There’s also the fact that since men are starting from a lower rate in the first place, they have more room to go up, further confounding direct comparisons.)
Case wanted to be clear that she was very happy that the PNAS paper had generated so much discussion. But she also saw a conflict between the norms prevalent in peer-reviewed academic publishing versus in academic blogging and other, less “formal” sorts of critique and analysis. “In a peer-reviewed paper,” Case said, “there’s a referee, there’s an arbiter who’s going to say, ‘This makes sense, this doesn’t.’ But with a blog, the blogger always has the last word. And if this is all people shooting from the hip, I don’t think that’s any way to move science forward, to move the research forward.”
That doesn’t mean people shouldn’t be aggressively digging into the numbers, she said. “There are a lot of different ways to cut the data, and all of those are probably going to reveal something interesting.” But she argued that it’s a matter of context: “It’s sometimes the case that when bloggers are responding in real time, they’re looking at one small slice of it without looking into the one bigger picture in which it fits.”
In an emailed response, Andrew Gelman said he disagrees with Case’s argument about the Urban Institute researchers’ methods — the question of how to age-adjust the data, he said “will make very little difference in the results.” What matters more is that the data do need to be age-adjusted.
Here’s a slightly edited and (where indicated) condensed version of the rest of his note:
I don’t know who elected Anne Case as the authority on “how science gets done.” I think Case and Deaton’s paper was excellent, and I would still think it was excellent if it were published on a blog. I think my analyses are useful, and I think they would no less nor no more useful if they were published in the Proceedings of the National Academy of Sciences.
Here’s what I wrote yesterday in Slate:
Post-publication review is a wonderful thing. A blog commenter alerted me to the possibility of age-aggregation bias, Angus Deaton pointed me to the relevant CDC website, and I was able to dive into the data, perform some calculations, and make some graphs. The classical peer-review system is painfully inefficient: Once an article appears in a journal, I could submit a letter of correction, that letter would have to go through a review process and would be severely limited in length, then the original authors could reply, and so on. All at the speed of the U.S. mail circa 1775. Real-time feedback gets us there much faster.
I think it would be ludicrous to ask people to wait however long it takes for me to publish a letter of correction. Why keep age adjustment a secret? Why keep the diverging patterns for men and women a secret? Because Anne Case thinks this isn’t “how science gets done”? That’s not a good enough reason for me!
It’s too bad that Case and Deaton made the mistake of writing in their paper that “Patterns are similar for men and women when analyzed separately.” ‘Cos they’re not. Your interview with Case is helpful in that it seems that they would’ve liked to write something like, “Patterns for men and women are quite differently when analyzed separately. But we interpret those differences as resulting from different trends in smoking patterns between the sexes.”
Unfortunately, it seems that because of space limitations in the journal, they were not able to go from one sentence to two sentences. This would’ve cleared up so much confusion.
Really too bad. But that’s a problem with peer-reviewed journals: arbitrary space limitations. Thank goodness we have non-peer-reviewed blogs where we can show as many graphs as we’d like and thank goodness we have non-peer-reviewed magazines where people like Anne Case and Angus Deaton can tell us what they really think, without being reduced to incomprehensible terseness by the combination of space limitations and the need to avoid upsetting any referees.
I find it ironic and upsetting when Case says, “In a peer-reviewed paper, there’s a referee, there’s an arbiter who’s going to say, ‘This makes sense, this doesn’t.’ But with a blog, the blogger always has the last word. And if this is all people shooting from the hip, I don’t think that’s any way to move science forward, to move the research forward.” Especially since she admits that her paper had errors that were pointed out in blogs. If there were no such thing as post-publication review and no such thing as blogs, we’d still be here thinking that the death rate among middle-aged non-Hispanic whites was increasing, while it actually has been steady since 2005.
Case and Deaton did great work that happened to appear in a peer-reviewed journal. But their work wasn’t perfect. In social science, nothing is. They forgot to age adjust, naively thinking that 10-year bins were narrow enough that no adjustment was necessary. Fine, nobody’s perfect. It’s too bad the journal reviewers did not notice the age-aggregation bias, but reviews aren’t perfect. Luckily there’s this thing called post-publication review: people like me and the group at the Urban Institute can report our corrections right away, we don’t have to wait for permission from Anne Case or anyone else.
I have a great respect for the work of Case and Deaton, and I don’t care where it’s published. I think they should show the same respect for the rest of us, bloggers and otherwise. Forget the gatekeepers — let’s do science.