In this episode, Matt and Kevin stick with the brain, this time looking at the immunological implications of a variant of a gene called Apolipoprotein E that has been linked to Alzheimer's and other neurodegenerative diseases.
It's a long one, and we get snarky in this one folks! Buckle up!
In this episode, we spend a fair amount of time talking about the statistical implications of these methods, and Kevin says things like
if I generated a bunch of random vectors, we would still see the same thing.
But is that really true? Kevin wrote some code to find out! You can find the link to the actual code below if that's your jam, but to give a brief summary, Kevin
Here's what he saw:

So it's not literally true that completely random data will segregate based on this analysis - mea culpa. But if you add even a tiny bit of underlying correlation structure (some directional noise on 1% of samples), we see the following:

Which is much closer to what it looks like in the paper. We discuss these results and the code at around minute 68, and the link to the code is below.