Is Tan et al. The End of Social Science Genomics?

What happens when everything is unconfounded?

Apr 04, 2025

Tan et al (2024) is a remarkable piece of work, and the first thing I need to say is that I am not here to criticize it, at least for 90% of this post. I don’t know the first author, whom I assume is a junior person, but the typically long author list is a who’s who of statistical genetics (Augustine Kong, Alex Young, Peter Visscher), BGA people (Dorret Boomsma, Nick Martin, Elliot Tucker-Drob) and ssgac (Dan Benjamin, Patrick Turley). I am leaving a lot of people out. The author list will be important at the end.

To understand what this study is about, you have to back up a little. When GWAS for complex human traits first came online around 2010, there was a sense in which it was mostly a recapitulation of what we already knew from twin and family studies. Twin studies had shown that everything is heritable. One way to say what that means is that, averaged across loci, genotypic differences are correlated with phenotypic differences. Unless you are the sort of person with a radical disbelief in twin studies, once you accept that, it had to be true that there were correlations to be found in the SNPs. Once the samples got big enough, it turned out that there were, in the form of individual SNP “hits”, SNP heritabilities, and p<.05 polygenic scores.

Early GWAS was like twin studies in another way— the relevant genetic variation was all between families, not within them. Look at it this way: Suppose you wanted to come up with a “genetic” prediction of a behavioral phenotype, say IQ. One way to do it would be to measure the IQs of as many first and second degree relatives as you could find, and compute an average of their IQs, maybe weighting by genetic relatedness. You could actually do that in the big Scandinavian population databases. That prediction would work pretty damn well.

But wait a minute. Such a prediction is not really “genetic” because those correlations with relatives are confounded by a million things, most importantly family environment. In BG terms, you are predicting from A+C, not just from A. Another way to say the same thing is that a shortcoming of this prediction is that it necessarily makes the same prediction for all siblings raised in the same family, because they have the same first- and second-degree relatives. A third, more technical way to say it is that classical ACE model twin studies can’t identify differences between within- and between-pair genetic variance, so they are fixed to be the same in twin models.

The greatest contribution of complex-trait GWAS was to solve this problem. Because parents, children, siblings, and other non-MZ relatives only share some of their genes, quantitative geneticists have learned how to conduct GWAS within families as well as between them. It has been a remarkable and laudable effort to take something that was just starting to look pretty promising— SNP heritabilities and polygenic scores, which Plomin among many others had declared as a “game changer” for social science— and figure out what portion of their effects were actually the result of culture or other family-level confounds.

Tan et al is the apotheosis of this effort. I will not put a lot of effort into describing the technical aspects of the report, which are well-described in the paper, and in some excellent posts by

Sasha Gusev

, see HERE and HERE. Tan et al conduct an analysis of 34 phenotypes across 17 cohorts. The phenotypes are divided between complex medical traits like age at menarche and blood pressure, and a set of standard ssgac behavioral traits:

ADHD, Age at first birth, Cannabis, Cigarettes per day, cognitive performance (5 measures), depression (2), drinks per week, educational attainment (5), ever smoke, extraversion, income (2), morning person, neuroticism, number of children, self-rated health, and subjective well being. (Not all of the tables break things out quite this finely.)

For all of the measures, heritabilities and PGI performance was evaluated both at the population level, including both “direct genetic effects” (DGE) plus all confounds, and then again using within-family information to estimate heritability and PGI performance based on DGE only. (For the uninitiated, heritability is an abstract estimate of the total amount of phenotypic variance is associated with genotypic variance, whereas a PGI is a real world instantiation of genetic differences, a number assigned to each participant that can be used to predict their phenotype. Heritability places an upper limit on the performance of a PGI.)

So what happened? As expected, the unconfounded estimates were almost all lower than the confounded ones. I am not going to focus much on the confounded estimates or the degree of reduction, since Sasha has already talked about that, and my main interest is with the absolute magnitude of the unconfounded effects.

Supplemental Table 5 lists the heritabilities, both in the population and the unconfounded DGE ones. The median DGE heritability for behavioral phenotypes is .048. Let that sink in for a second. How different would the modern history of behavior genetics be if back in the 80s one study after another had shown that the heritability of behavior was around .05? When Arthur Jensen wrote about IQ, he usually used a figure of .8 for the heritability of intelligence. I know that the relationship between twin heritabilities and SNP heritabilities is complicated, and in fact the DGE heritability of ability is one of the higher ones, at .233. But still, it seems to me that the appropriate conclusion from these results is that among people who don’t have an identical twin, genomic information is a statistically non-zero but all in all relatively minor contributor to behavioral differences.

It gets worse. In many ways, the performance of PGI is really what is it issue here. This is what Plomin referred to as the “game changer,” the fortune teller that was going to reveal to us who we really are. PGI are the basis for all the crazy, unethical enthusiasm for commercial genomic information and embryo selection. PGIs are the basis for the big hopes for precision education and psychiatry. How do the PGIs perform once they are fully unconfounded?

These results are in Supplemental Table 9. The median R2 value for the behavioral phenotypes is .001. That is one-tenth of one percent, which is to say (don’t bother me with statistical significance) zero. Two EA phenotypes, at .011 and .018, were the only ones to make it over 1% of the variance.

Holy mackerel, as my grandson likes to say. At the end of a long (though pretty fast) research program from gene-finding GWAS, to GCTA and SNP heritability, to PGI, to within family heritability and PGI, and now finally to precise estimation of unconfounded direct genetic effects, the field has reached a remarkable conclusion: polygenic scores for complex human behavior don’t work, at least not if you care about the basis of the associations that underlie them.

Now I get to the only part where I need to be a bit critical. I should be clear at the outset that there is nothing sneaky about the way the data were reported. All of the results are mentioned in the text, illustrated in the figures, and listed in the supplemental tables. All I had to do was copy them out.

My issue, at bottom, is with the discussion section. The authors of this paper are not just quant genetic technicians. They are working scientists, many of them working behavioral scientists, many of them public enthusiasts for the utility of GWAS-based genomics for understanding complex human behavior. Many of them, not to personalize things, have found occasions to let me know that I am “too pessimistic” about the possibilities of applying PGI in the real world. So you would think that when all was said and done they might say something about the fact that unconfounded genetic effects turned out to be very small at the level of heritabilities, and zero at the level of PGI, but there is nothing. The actual results of the study are simply ignored.

There is, as I have said elsewhere, an unrecognized effect size crisis in GWAS-based behavior genetics. The response of the field has been to ignore it instead of grappling with it. The Spit for Science project, which I wrote about last year, is a similar but different expression of the same problem. S4S wasn’t a big technical exercise like this one, it was a smaller (n=12,000) effort to apply genomics in a real-world setting on a college campus. But exactly like this report, S4S spent a decade reporting thoroughly null results, while studiously avoiding the implications of their own findings.

Of course, there are many things that might be said about all this. Those twin heritabilities are still out there, and maybe one way or another SNP-based methods will eventually live up to them. Or maybe, Plomin-style, we should just ignore all the confounding and do what we can to maximize the prediction, willy-nilly. Maybe the old promises of better results in even larger samples will eventually come true. But what won’t work is ignoring the problem, relying on public enthusiasm as a cover for null results.

David Hugh-Jones

Apr 6

The heritabilities in Supp Table 5 are SNP-heritabilities, ie they only consider the variation found in SNP array data, not all genetic variation. Everyone in the field knows this, but outsiders might not. It is surely misleading not to mention this.

Expand full comment

8 replies by Eric Turkheimer and others

Steve Pittelli, MD

Apr 4Edited

Great points, Eric. I’d point out, though, that some of the public enthusiasm for this was generated by scientists’ overly optimistic claims, which date well before the GWAS era (coupled of course with movie and soap opera twin plots and genetic themes in science fiction). Moreover, taking at face value the paltry .05 heritability you discuss, ignores the possibility that even more confounding will be recognized on the horizon as well as the possibility that some some of this tiny heritability has more to do with physical traits (height, attractiveness, skin color, etc.), which would likely have a small influence on any “behavioral” trait. I don’t know if it is the end of social science genomics, but it should certainly be the end of attributing significant genetic influence to behavioral traits (despite the recent scientist-generated cartoons touting genes for “income”).

30 more comments...

Eric Turkheimer - Gloomy Prospect Blog

Discussion about this post