Hi Im Wendy Zukerman and you’re listening to Science Vs from Gimlet.

This is the show that pits facts against phenotypes. Nerd Alert. Today on the show: DNA test kits - can you trust what they’re telling you?

Millions[1] of people are sending off their DNA to companies like Ancestry.com and 23and me and they’re hoping to find out about where they come from and what diseases they might end up with…These companies say for a mere hundred bucks[2] or so they can reveal these amazing things about you and your family…which can feel really exciting… like it did for some of our colleagues at Gimlet.

LM: there’s this hole in my family tree I would love to fill it

AB I’m just curious you know how I fit into the rest of humanity

GL No one in our family seems to know for certain where we come from

Two of our colleagues…in particular have a rather mysterious ancestry… there’s Alice, who looks a lot like her Irish American mum and nothing like her Indonesian dad.

AK like my hair is kinda red and I have green eyes and look like i'm from ireland.

Like  I present completely white but my dad does not

And there’s Gabe.. who knows he’s got some Irish and Puerto Rican in his family… but then there are these big gaps he’s very curious about.

GL We realized that we don’t know a ton about our family history, outside the stories we tell each other…I do think there’s an element there that I feel like I would have a better sense of myself if knowing where I came from..

And because there’s blanks in his past… Gabe knows very little about what diseases might be lurking in his DNA…

GL there’s just like a lot of questions around what is our medical history? 

WZ Are you excited about knowing or is it scary?

It is a little scary, it’s like I would know rather than not know


The promise of DNA tests is that you can have answers to all these questions and more ... all you have to do is ... is spit in a little tube… Science Vs Producer Rose Rimler walked Gabe and Alice through it. 

While some people can’t wait to send off their spit and get some answers…  Others are skeptical… they’re wondering how accurate are these results?… and… hang on a minute…is it OK to be giving something as personal as our DNA to a private company[3]

NR: It just feels a little weird to me

LM this makes me ask myself is it worth it?

So that’s what we’re going to find out today… is it worth it? By answering the following questions:

  1. Can these tests really tell you where your family is from?
  2. Can they reveal diseases hiding in your DNA?
  3. What can these companies do with your DNA once they have it? Can they sell it to the highest bidder -- perhaps your insurance company?

When it comes to DNA kits there’s lots of

spitting very gracefully

But then there’s science.


Science vs DNA kits ... is coming up… after the break.


Welcome back. Today we’re tackling DNA testing kits… to find out: can you trust them? We’ve come to expect a lot from DNA -  when cops are chasing bad guys on TV, DNA evidence always clinches the case.. ...but when we send off our DNA to sites like 23andme and Ancestry.com… they come back with these curious results..and it’s harder to know what to make of them…

So let’s start with one of the biggest reasons[4] [5] people do these tests: to find out, where do I come from?

AK how it’s going

RR Hi 

Our colleagues Alice, who looks like she belongs on stage with Lord of the Dance - despite having an Indoenesian dad, and Gabe who had some gaps in his family history…. couldn’t wait to bust open their results.

RR So you got an email that was like ding

GL Yeah It was so hard not to open by myself

RR Let’s open it up

GL Yay

RR Ok flip through the story of your DNA


RR Gabriel welcome to you!

GL Oh wow they really go there don’t they

GL yahhhh so 46.5% british and Irish...ties to ten other populations

GL sub saharan african 11.1 percent.

RR not something you predicted

GL no

GL I never knew that… Totally wild.

Gabe was surprised that a chunk of his ancestry was from sub saharan Africa … while Alice found something surprising too… even though her dad is Indonesian

AK I am a quarter east asian but only 2.8% of that is Indonesian That’s really interesting, I’m going to have to ask my dad about that, do you even know who your family is?

So there was a lot of excitement in the studio… but are these results accurate? To find out, we talked to Jonathan Marks, a professor of anthropology at the University of North Carolina at Charlotte[6] …

JM so yeah so in between spitting into the tube and receiving your results what exactly is happening there

And first off he says… that once these companies get your DNA… they don’t scour[7]… through every single part of your genome, instead they zoom in on the chunks of DNA that make each of us unique. They compare those chunks to this big DNA database that they have…

they've collected DNA from people all over the world [8][9]

by recruiting people...or using customers…[10]  and even publicly available sets of DNA[11].

So if you get a result saying you're 35% german...

JM what they're telling you is that from their sample of DNA from different countries you have 35% match to their samples from germany, so it’s a measure of how similar you are to the DNAs they have on file


Jon and about a half dozen other experts we spoke to said that what these companies are doing is a pretty reasonable way to take a stab at someone’s genetic family tree.  OK, great. So of the DNA they’re looking at, 35% matches the Germans[12].  But the big question is: Who are these Germans? Like, whose DNA gets to represent German-ness in the database? Well, here’s where things get messy.   

Because it’s not like there are tiny little bratwursts in the DNA of people who are “German”.  So sites like 23andme and Ancestry.com have to decide who’s German. Here’s how they do it…

 Often, to be considered “German” or “French” or “Indonesian[13] - At least… BOTH sets of grandparents...had to be born in that country.[14] [15] [16] [17]

And the thing is that… sometimes, even if they find someone who fits their Granny criteria… all their grandparents… born in Germany… BUT that person’s DNA looks different to the other German people” in their database… then these companies  will throw them out of the pile![18] [19] [20] They say…

<<HK One day you’re in, the next day, you’re out. Auf weidersen!>>

Which brings us to Jon’s first big gripe about these ancestry tests … because these companies are basically creating a world of model Germans or more Frenchies…  

but there are no such peopleJM geneticists like to imagine the purity of people they're working with; it simplifies their calculations

Jon says that we’re all mixed up. We don’t fit into neat little boxes for a really obvious reason. Sex. Humans have been travelling  across the globehaving a lot of sex, with a lot of different people, for a very long time. And it means our DNA is all mixed up. Take Europe, for example

JM The Vandals[21] visigoths[22] Huns[23] and the vikings[24] all these migrations and all of these movements of people where there’s migrations movements there’s gonna be gene flow polite term for anything from marrying your neighbors to mass rape.  So there’s always genetic contact between populations and there’s always movement of populations so the idea that people just sat still for prolonged periods of time is just a fallacy

On top of this frustrating fallacy… most[25] [26]of these DNA databases are filled with customers who have a European ancestry… so for a tonne of countries…  they don’t have a big sample to work with it. And this can have real consequences for people’s results.

Take Indonesia. Alice’s dad is Indonesian, but her results said only a fraction of her DNA was from there. Well, it turns out that… 23andme has very few people in their database from Indonesia. In fact, they combine Indonesia[27] with Cambodia[28], Thailand[29], Myanmar[30] and Malaysia.[31] [32] And guess how many people are in this database for all five countries? 124 people. 124! To represent a region with more than 400 million people[33] [34] Now, the big testing companies know this is a problem and say they’re working on it, trying to make their databases more diverse[35]. But bottom line…  maybe Alice’s family IS Indonesian… but with just 124 people in its database, the test can’t tell her that yet. We talked to Alice about this… 

AK 124 people 


Whoa. That’s it? That’s crazy!

AK So Who I am… What am I? Where do I come from?

WZ We don’t know! We still don’t know.

AK Noooo

WZ So it’ll be interesting as the database grows… I imagine your results are going to change quite a lot

AK wild. WOW.

So there are some problems with these databases… which might mean they get your ancestry wrong …particularly if you’re family’s not European…

But there's another place on their results page where things might be a bit off... and it's in those precise numbers they give you... like Gabe finding out he was 11.1% sub saharan African. Rose asked Jon about this…

RR Someone at Gimlet, one of our colleagues, he took a test from one of these companies it said he’s 11.1% sub-saharan African. What does that mean?

JM Ha ha ha. The answer is, it’s not really clear. Ah they’re giving you a number that’s very precise but you don’t know how accurate it is.

Basically, don’t take these precise numbers as gospel…  And here’s why. When these companies compare your DNA to their less than perfect database… they are basically using an algorithm that takes lots of chunks of DNA and asks … what’s the chance that this chunk of DNA comes from Nigeria? Or, say Germany? What about this OTHER piece of DNA? And you - DNA - over there!... You look a little French…  but you could be British..? The algorithm then crunches the numbers..  Making its best guess… on what the devil those chunks of DNA are[36]and it repeats this for thousands of pieces of DNA[37]...until it gives you these seemingly precise numbers.

But because the world is messy… and we’ve all got each other’s DNA… every time the algorithm makes a call, there’s a chance they get it wrong and put you in the wrong box… so you can get results that look really precise… but they’re not [38].[39].[40] 

JM This is a very very noisy system they’re playing with here because human ancestry is noisy

This nerd gripe… becomes a real issue when you get results… saying you’re some tiny percentage of something...y’know..  2% Italian1% Brazilian…  

Does that mean anything?

No. Hahahaha…

Tell me more…  tell me more

 I came out 1% Korean

Oh! Are you 1% Korean?

Uhh No. I think what that means is there are just some regions of DNA that they’ve misclassified

In fact…. When the company later updated his report … that 1% Korean disappeared[41]… but... Jon says that when we get results like this ... it’s so easy for us start telling ourselves stories about.. who this Korean ancestor might have been … but that’s what these companies are kind of doing…

JM What they’re selling you is a story and it’s up to you to decide whether you like that story reject that story whether there are other stories available that you like better

And Jon says there’s something comforting about having a story about where you come from… he gets it. In fact, he’s got one for you.

JM Let me give a, let me tell you a story. My dad took 4 machine gun bullets in the leg during WW2, he was in the army hospital and they wanted to amputate his leg

But a Scottish surgeon managed to save it. and from then on, Jon’s dad always wanted to be Scottish. A few years ago, Jon was in Scotland and passed a shop that sold Scottish ancestry certificates for tourists. So he went in and talked to the shopkeeper.

JM I told him a story family legend our family began in highlands of scotland emigrated to russia then US 

RR Did you make that story up
JM Absolutely

RR It was a total lie?

JM Yeah that’s how I got my certificate

RR Does your dad know that?

JM Yes he was delighted to receive it

RR Even though it was based on nothing?

JM Now he’s Scottish he always wanted to be Scottish

So… when it comes to your ancestry… generally speaking, the science behind these tests makes sense… but they’re far from perfect. If it comes back saying that half your DNA is Irish, yeah, you’ve probably got some Riverdance in you ... but for that exciting 1% Korean in your DNA? .. well, it's a bit like sweet-talking a Scottish shop-keeper into selling you a phony certificate.

After the break… we raise the stakes… Because one DNA company promises to do more than tell you about your ancestry… they say they can peek into your DNA… to reveal diseases you could get … But can you trust them?


Welcome back…  Today we’re tackling DNA kits, yknow that Chrissy present… still gathering dust under the dying Christmas tree. We’re finding out… whether should you finally spit in the tube, and send it off?  So far… we’ve learned that when it comes to telling us about our ancestry … these tests aren’t bogus… but take them with a grain of salt…

Now we’re diving into our health… to find out what can these results tell you about whether you’re going to get sick? There’s only one of these DNA testing companies that is FDA[42] [43] [44]approved to send you health info[45] [46] without going to a medical professional..  and that’s 23andme. Now this company says they can tell you important information about your risk of  getting conditions like alzheimer’s, type 2 diabetes, parkinson’s, and even breast cancer[47] [48] [49] [50] [51]. Our colleagues Gabe and Alice were nervous about these results… particularly Alice… who talked it through with producer Rose Rimler.

RR Do you want to look at the health stuff?

AK That’s so scary, I don’t know,

AK I do… Uhh..  yeah I’m scared

Alice was so freaked out about what these results might tell her… that she wouldn’t look…

Ak: Do I really want to know that about myself?

But Gabe decided to open pandora’s box and read his genetic health report… and one thing stood out: Alzheimer’s.

GL Ohhh ok, one variant detected in the APOE4 gene. Umm… People with this variant slightly increased late onset alzheimers disease. Lifestyle and other factors can also affect your risk

RR so what do you think?

GL i don't know! It is a little nerve wracking I'll say that.

Gabe’s 23andme report said that he has a version of a gene that doubles his risk of getting Alzheimer’s by the time he’s 85[52][53]… which sounds scary… so we took it to an expert who’s really familiar with these tests...

AR 23andme take this stuff really really seriously. they have proper scientists and proper clinical geneticists

This is Adam Rutherford… he’s a British geneticist affiliated with University College London[54]. And Adam told us these results aren’t rubbish…  to get them… 23andme scans your DNA… looking for genes that studies have found increase your risk of certain diseases[55] [56] [57]..[58],[59].

You look at a particular gene which is associated with breast cancer,  a bunch of other stuff, so they only look at positions in your genome which are known to be interesting for the types of characteristics that people are interested in

So that version of the Alzehimer’s gene that Gabe has? Well, it bumps up the risk of getting Alzheimer’s by quite a bit[60] [61] [62]... So, for the best data we have… a man’s risk of getting Alzheimer’s by the time he’s 85 - is around 10%. For men with this version of the gene, it’s a little over 20%. And studies in women have found that having it carries a little more risk[63]. … So Adam says at first glance, this seems like a big deal.

AR: I'd imagine that would be absolutely terrifying to discover that you’ve got double the risk of getting Alzheimer’s.

Adam doesn’t have to imagine.

That is me in fact because 23 and me reports I'm at a higher risk category for alzhiemers, 

WZ were you a little worried?

AR i don't want to sound all macho but the answer is just no. i went huh, how about that?

It turns out that Adam is super chill about this result for a couple of reasons … the first is that he - and Gabe - only have one copy of this bung gene[64]. If you have two copies - your risk of Alzheimer’s goes up even more. But more broadly…  Adam says that even though having this version of the gene create a real risk… it’s far from a sure thing[65] …

AR Thing is It's perfectly possible for me to have that version of the gene and not get Alzheimer's. Or I probably won’t get Alzheimer’s regardless of whether I have that version of the gene. and it's perfectly possible to not have that version of the gene and get Alzheimer's…

So whatever the results are… they are not a fate sealed in your DNA.

Basically, when it comes to the types of diseases that 23andme can tell you about…like Alzheimer’s[66] … or breast cancer[67][68] [69] [70] [71]. … they are just showing you one piece of the puzzle here… And if you’re going to get these diseases -  there are so many pieces that have to fit together: including other genetic mutations that 23andme doesn’t test for… and even our environment… things like our diet and how much exercise we do[72],[73] [74] [75].… And all these things come together in these really complicated ways

AR that is what most human genetics is like… It’s a mess, it’s a really interesting mess, but dear god I wish I did physics cos it’s so much easier. All I’m saying this stuff is wickedly complex, and we don’t understand enough of it to say definitive things

All of this kind of makes it feel like these tests are totally useless. But theyre not. Because even though we don't know everything about the link between genes and these yukky diseases... these tells can still give you some information that might be useful. You may find out that you have a version of a gene that increases your risk of breast cancer[76] … and then you can see a doctor about it. Or maybe you find out, hey, you don't have this version of one risky gene - which is nice to know

There was an interesting thing that came out of those same results,

So Adam’s wife has a family history of emphysema, and Adam saw on his test that he didn’t have a gene that would up his risk of getting it[77] [78].

AR it was reassuring me to see that I wasn't a carrier, on the grounds that it reduces the probability that any of our children would have that disease. So in that sense I found that slightly reassuring.

So bottomline Adam says these tests can give you a little bit of information, but you have to be really careful about how much you expect from them.

AR The shrugging of my shoulder’s is based on 25 years of studying this, most people don’t have that, that bugs me.

WZ if you were a professor and someone gave you a site like 23andme as an assignment what grade would you give it? AR i would tell them to think about it a bit more harder.

WZ That’s like C+. That’s C+.

AR I’d say listen it’s an interesting project. it doesn't really qualify for my course. So you’re going to have to do it again. Better.

Ok… so it’s a passing grade on the disease front… not a science slam dunk. Still though, it feels like these DNA tests are like a fun science adventure… getting some fun mail with pretty pie charts so we had to wonder… what are the risks here?

Well… when we asked Gimlet folks if they wanted to do these tests for the show, a lot of people said they were interested and got on board. But when the spit hit the fan...they were like….actually… maybe not[79][80].

NR It just feels weird to me… yeah it just feels like sensitive information to give away to a private company

STT That’s how they get you, that’s how they get you.

STT What if that data ends up in the wrong hands say later on in life, like insurance companies get a hold of that data

RR So are you gonna do it?

STT Hell no!


LM I think I’m gonna pass


Part of the feeling here is that this is the most intimate data we have ... it's our building blocks, our DNA… And once we send it out into the world, we can't get it back… So the big question that we wanted to know is what can these do with it?

We’re hearing stories about these companies handing over our DNA to big Pharma[81]. and even the cops … So, we looked into it. It turns out… 23 and Me basically lets you say ‘no don’t give my data to Big Pharma’...and as for the fuzz[82] ? For now, the major DNA companies say they won’t share your data with police unless there’s a court order[83][84]But… these folks aren’t the only ones who might want to get their hands on your DNA … what about employers and health insurance companies? If you find out you have a gene that increases your risk of breast cancer… can they get a hold of that? We talked to Adam about this. -

AR It’s not a complete Wild West, that would mischaracterize this situation There are laws in place, especially in the States.

One of the big laws here is called the Genetic Information Nondiscrimination Act, or GINA. And it basically says that health insurance companies can’t use your genetic info to decide how much to charge you, it also says employers can’t ask about that genetic info … to make decisions like hiring  or firing you[85]And this act...it should[86] [87]apply for the stuff you learn from sites like 23and me… Science Vs producer Rose Rimler

Hey Rose

RR Hey

Has been going down the rabbit hole on this one… so Rose, what is going on here... how good is GINA at protecting us?

RR: So I didn’t find a case where an employer or health insurance company tried to use information from 23and me or Ancestry or other sites, but there is this one amazing case I did find, that does show this law seems to be holding up, at least when it comes to your employer[88] [89] [90]. … and I want to tell you about. And just a heads up, if you’ve got kids listening, this does involve some bad words. FYI

WZ  What happened in this case…

RR OK so It started in 2012 there was this warehouse company in Atlanta, The problem was someone at company was doing a shit job. Literally, they were taking shits and leaving them in piles all around the workplace[91]. 

RR Just like, random shits

WZ Why? Would you do that?

RR That’s what the bosses wanted to know. Or more importantly they wanted to know … who was doing it…

WZ: The motivation wasn’t important?

RR probably secondary.

RR And they got cheek swabs from a bunch of employees, to find the culprit

WZ It was the ulitimate crap down!

RR: Yeah they were crapping down on the illicit pooping…  It was a very high-tech, CSI type investigation if you will

WZ: Wow yeah… …

RR So two of the people they requested DNA from, they were exonerated[92]-- And shortly after that happened … these guys were like you can’t request my DNA, my employer cannot request my DNA, and they brought a case under GINA and they won! The judge was like - you’re right!

WZ: Like, this case what will now become precedent for the power of GINA?

RR This case is brought up all the time, among scholars or journalists who  talk, write about this issue they’re like well there was the case of the devious defecator[93].”

WZ: Devious defector?

RR: Which is what the judge called the case


RR: when she issued her decision! The mystery of the devious defecator…. So thank God somebody felt the need to stink up the warehouse.

WZ: Do you think this case of the devious defecator and this law, does this protects us enough?

RR: Well GINA protects you if you’re shitting in the warehouse  And you don’t want your boss to find out it was you. So I mean there are definitely some limitations. And one big one that people bring up a lot is that GINA does not cover things like life insurance or disability insurance[94][95]., so there are other things that might come up where your genetic info might matter, but GINA doesn’t cover them.

WZ:  All right… so neither of us have gotten tested, but now we’ve done this research, would you do this in the future?  

RR: I probably wouldn’t mostly because I feel the benefits are small and the risks are small. But if the benefits are small why take any of those risks?

On the other hand I have heard stories of people who were really glad they did this maybe they found out something that was meaningful to them, I know Gabe and Alice they don’t regret doing it, they had fun. But for me right now, I don’t want to do it…

WZ: Right

RR: Is that how you feel?

WZ: Yeah I think so… I would not do this. Because I think I was swayed by geneticist Adam Rutherford, who we just heard from..  And we were talking about the wider implications of where this might go. He quoted a great philosopher of our time...and I think that really convinced me that this wasn’t the right thing to do for me … here it is

AR  Jeff Goldblum.  Jurassic Park[96] saying DNA is most powerful tool in the known universe and yet we wield it like a child who’s found his dad's gun.


AR That’s right we talk about this stuff all the time, we wave it around on court cases and Ancestry tests, but we are at the beginning of discovering how our genomes work, and just so much caution is required.


That’s Science Vs DNA kits.

Next week Science Vs Race… With all this talk of our genetic differences, we had to ask: how does race fit into this? Biologically, does race exist?

DR Human beings regardless of race are more than 99% the same the problem though is that the 0.1% is a lot of genetic variation

WZ It’s the end of the episode as we know it

RR Na Na Nah

WZ which means it’s time to find out how many citations in this week’s episode, Rose Rimler?
RR About a hundred

WZ ABOUT a hundred? You’re not going to give me specifics?

RR There’s 96 citations in this week’s script

WZ 96 citations?

RR That’s right.

WZ And if people want to read these citations where should they go?

RR They can check out our show notes where they listen to the podcast and or our website, sciencevs.show

WZ Thanks Rose

RR You’re welcome


This episode was produced by Rose Rimler, with senior producer Kaitlyn Sawrey… with help from me, Wendy Zukerman, Meryl Horn and Michelle Dang. We’re edited by Blythe Terrell. Fact checking by Michelle Harris and Michelle Dang. Mix and sound design by Peter Leonard. Music by Peter Leonard, Frank Lopez, Emma Munger and Bobby Lord. Recording assistance from Cole del Charco, Madeline Taylor, Carmen Baskauf, Ian Cross and [Mareek] Marijke Peters. A huge thanks to everyone who spat in a tube for us, especially Toni Magyar and Alex Blumberg, and to all the researchers we got in touch with for this episode, including Dr. Wendy Roth, Dr Deborah Bolnick, Dr Celeste Karch, Professor Nancy Wexler, Dr. Robert Green, Dr Catharine Wang, and others. Thanks also to the teams at Ancestry.com, 23andMe, and MyHeritage.

Finally, thanks to the Zukerman Family and Joseph Lavelle Wilson.

I’m Wendy Zuk. Fact you next time.

[1] https://www.technologyreview.com/s/610233/2017-was-the-year-consumer-dna-testing-blew-up/ 

[2] 23andMe Ancestry kit $99;  23andMe Health+Ancestry $199; Ancestry.com AncestryDNA kit $99; MyHeritage $69-79; NatGeo $69.95-$99.95 (as of April 2019)

[3] 23andMe chooses to use all practical legal and administrative resources to resist requests from law enforcement, and we do not share customer data with any public databases, or with entities that may increase the risk of law enforcement access. " https://www.23andme.com/law-enforcement-guide/ 

[4] DTC-PGT consumers were as interested in ancestry (74% very interested) and trait information (72%) as they were in disease risks.

[5] 46.41% gave ancestry as their first option, followed by health (41.56%) whereas contribution to research trailed third (14.35%).

[6] https://anthropology.uncc.edu/node/131

[7] Once our lab receives your sample, DNA is extracted from cells contained in your saliva. The lab then copies the DNA many times—a process called amplification—duplicating the tiny amount extracted from your saliva until there is enough to be genotyped.

In order to be genotyped, the amplified DNA is “cut” into smaller pieces, which are then applied to our DNA chip (also known as a microarray), a small glass slide with millions of microscopic “beads” on its surface. Each bead is attached to a “probe," a bit of DNA that matches one of the genetic variants that we test. The cut pieces of your DNA stick to the matching DNA probes. A fluorescent label on each probe identifies which version of that genetic variant your DNA corresponds to. https://customercare.23andme.com/hc/en-us/articles/202904610-How-does-23andMe-genotype-my-DNA- 

[8] The updated AncestryDNA ethnicity estimation reference panel contains 16,638 samples carefully selected as described to represent 43 overlapping global regions (Table 2.1), https://www.ancestrycdn.com/dna/static/images/ethnicity/help/WhitePaper_Final_091118dbs.pdf 

[9] https://blog.23andme.com/ancestry/30-new-reports-and-1000-more-regions-included-in-23andmes-latest-ancestry-composition-update/ In January 2019, 23andMe says they have over 1000 regions

[10]  Customers comprise the lion's share of the reference datasets used by Ancestry Composition. 

[11] We also draw from public reference datasets, including the Human Genome Diversity Project and the 1000 Genomes project. Finally, we incorporate data from 23andMe-sponsored projects, typically collaborations with academic researchers. We perform the same filtering on these public and collaboration reference data that we do on the 23andMe customer data.

[12]The analysis we perform is called genotyping. Genotyping looks at specific locations in your DNA and identifies variations. These variations make you unique.

In choosing these specific locations, we focus on the variations that are known to be associated with important health conditions, ancestry and traits. Genotyping is a great way to start understanding how your genetics can impact your life.

[13] Customers comprise the lion's share of the reference datasets used by Ancestry Composition.When a 23andMe research participant tells us that they have four grandparents all born in the same country — and the country isn't a colonial nation like the US, Canada, or Australia — that person becomes a candidate for inclusion in the reference data. We filter out all but one of any set of closely related people, since including closely related relatives can distort the results. And we remove outliers, people whose genetic ancestry doesn't seem to match up with their survey answers. To ensure a representative dataset, we filter aggressively — nearly ten percent of reference dataset candidates don't make the cut.

We also draw from public reference datasets, including the Human Genome Diversity Project and the 1000 Genomes project. Finally, we incorporate data from 23andMe-sponsored projects, typically collaborations with academic researchers. We perform the same filtering on these public and collaboration reference data that we do on the 23andMe customer data.

[14] . Ideally, we’d use people with all of their grandparents from the same country, but due to low numbers for some countries we sometimes use parents or even the customer’s birth location. https://www.ancestrycdn.com/dna/static/images/ethnicity/help/WhitePaper_Final_091118dbs.pdf 

[15] To create the ethnicity estimate, we compare a customer’s DNA to a panel of DNA from people with known origins (referred to as the reference panel) and look to see which parts of the customer’s DNA are similar to those from people represented in groups in the reference panel. If, for example, a section of a customer’s DNA looks most similar to DNA in the reference panel of people from Sweden, that section of the customer’s DNA is assigned to Sweden. The end result is a portrait of a customer’s DNA made up of percentages of the 43 ethnicities contained in the reference panel. https://www.ancestrycdn.com/dna/static/images/ethnicity/help/WhitePaper_Final_091118dbs.pdf

[16] 23andMe: The starting point is self-reported birth locations of and languages spoken by individuals' four grandparent. We then conduct a series of dimension reduction analyses to identify robust and cohesive genetic clusters and to identify outliers or potentially incorrectly labeled individuals. We then iteratively train Ancestry Composition, assess performance, and re-define the Ancestry Composition populations until we achieve satisfactory performance on hold-out test data consisting of both unadmixed individuals and simulated admixed individuals.

[17] If a poor proxy is used for one ancestral population, the method might compensate by adding admixture from other ancestral populations. Consider genetic ancestry testing performed on an individual we will call Joe, whose eight great-grandparents were from southern Europe. The HapMap populations are used as references for testing Joe's genetic ancestry. The HapMap's European samples consist of “northern” Europeans. In regions of Joe's genome that vary between northern and southern Europe (such regions might include the lactase gene, LCT [MIM #603202]), the genetic ancestry test using the HapMap reference populations is likely to incorrectly assign the ancestry of that portion of the genome to a non-European population because that genomic region will appear to be more similar to the HapMap's Yoruba or Han samples than to its (northern) European samples.

[18] Next, we remove samples from the reference panel candidate set when the genetic data about ethnicity disagrees with what that person has reported about their ethnicity--when underlying genetic information disagrees with the pedigree data. We use principal component analysis (PCA) to identify these outliers. ...We first remove outliers at the global level (all samples together), then at the continental level (e.g., outliers in a PCA using only European samples), then at the regional level (e.g., outliers in a PCA of all Scandinavian samples), and finally at the population level (e.g., outliers from a PCA of Norway). https://www.ancestrycdn.com/dna/static/images/ethnicity/help/WhitePaper_Final_091118dbs.pdf 

[19] we remove outliers, people whose genetic ancestry doesn't seem to match up with their survey answers. To ensure a representative dataset, we filter aggressively — nearly ten percent of reference dataset candidates don't make the cut. Prep 1: The Datasets - 23andme

[20] 2014 23andMe White Paper: We trained the local classifier on more than 9700 individuals divided into K = 25 reference populations. We denote N the number of training individuals. The reference populations are a combination of countries and broader geographic regions. We required that a population contain at least 25 individuals. When a country did not contain enough individuals, it was grouped with countries from the same geographic region. The process was repeated until each population contained enough individuals. Next, we describe the steps involved in building the training set. We started by building a dataset of N∗ = 10699 unrelated individuals with known/self-reported ancestry. Out of the 10699 individuals, 1793 came from publicly available datasets (1000 Genomes: 765, CEPHHGDP: 941, HapMap3: 87). The remaining 8906 were research-consented 23andMe members who reported via a survey on the 23andMe website that their four grandparents were born in the same country. The text of the questions asked was “In which country was your (mother’s mother) born?”, for each of the four grandparents. The answer was chosen from a list of countries, which included the option “I don’t know”.

[21] https://www.taylorfrancis.com/books/e/9781351876117/chapters/10.4324/9781315235127-11

[22] Visogoths 

[23] https://www.taylorfrancis.com/books/e/9781351873680/chapters/10.4324/9781315234311-4

[24] https://www.tandfonline.com/doi/abs/10.1080/00291956808551851

[25] A large proportion of 23andMe customers have unmixed European ancestry, we have the most reference data from European populations.

[26] AncestryDNA see table 3.1

[27] 23andMe 124 reference number of Indonesian, Cambodian, Thai, Myanma, Malaysian https://docs.google.com/document/d/e/2PACX-1vTmnLn8cUNatSo_ui6g3Mw3FQrm8QEuGpNv-THysbx-gaFnetbMprI5bCp8ObZ9kaqMngJoII0lt7Z2/pub

[28] Population of Cambodia: 16 246,000 http://data.un.org/en/iso/kh.html 

[29] Population Thailand: 69 183,000 http://data.un.org/en/iso/th.html 

[30] Population of Myanmar: 53 856,000 http://data.un.org/en/iso/mm.html 

[31] Population of Malaysia: 32 042, 000 http://data.un.org/en/iso/my.html 

[32] Indonesian, Thai, Khmer & Myanma

124 reference population for Indonesian, Cambodian, Thai, Myanma, Malaysian

[33] Sum of populations: 438,122,000 (Cambodia, Thailand, Myanmar, Malaysia, Indonesia)

[34] http://data.un.org/en/iso/id.html 266,795,000 for Indonesia. http://data.un.org/en/iso/us.html 326,767,000 for USA

[35] Confronting Bias: Historically, biomedical research has disproportionately focused on participants of European descent. Due to this bias, and to the fact that a large proportion of 23andMe customers have unmixed European ancestry, we have the most reference data from European populations, and we are able to distinguish as many sub-populations from Europe as across all of Asia.

In light of this inequity, the 23andMe Research team is constantly working to acquire new data from diverse populations. We have worked proactively to reduce bias in genetics research by initiating projects like the African Genetics Project and our NIH-funded genetic health resource for African Americans.

[36] We measure between 7,400 and 45,000 markers per chromosome, which translates to 24 to 149 windows, depending on the chromosome's length. We then take each of those windows in turn and compare them against the DNA from reference individuals (who represent various populations) to determine what population your DNA most likely came from. https://customercare.23andme.com/hc/en-us/articles/115004339467-How-Ancestry-Composition-Works 

[37] AncestryDNA 1.3 we have developed software similar to GERMLINE that allows us to quickly detect matches in hundreds of thousands of phased genotypes, as well as quickly identify matches as new customers enter the database each day

[38] Because the probability of a specific pair of alleles appearing at a certain position in the DNA varies for each of our 43 regions, we can use that information to tell us which region a stretch of DNA most likely came from. For example, if AA at a particular position is more common in people from Spain, someone with AA at that location might have a higher chance of having Spanish ancestry. It is important to keep in mind that an AA at this particular position just makes it more likely the DNA comes from Spain. Plenty of people from Portugal, France, or even Korea might have AA at this position as well. https://www.ancestrycdn.com/dna/static/images/ethnicity/help/WhitePaper_Final_091118dbs.pdf 

[39] Most of these previously mentioned methods have been demonstrated to be highly accurate for the case of two way admixtures such as in African Americans (Seldin et al., 2011). However, the accuracy of such methods declines for more complicated scenarios such as the admixture of three ancestral populations in case of Latinos (European, African, and Native American). The presence of closely related populations in multi-way admixtures (e.g., Europeans and Native Americans) further increases the difficulty of inference. Many existing methods either cannot handle these scenarios or are prone to high error rates making it hard to reliably study LA [local ancestry] in such cases. Keeping these issues in mind, several new approaches were developed in the last few years to more effectively handle multi-way admixtures. 

[40] We know that because these companies often let you see which results they’re most confident about. When our colleague Gabe looked only at the results 23andme was most certain about, the amount of his DNA that was Spanish and Portuguese plummeted …  and they weren’t sure about where a fifth of his DNA came from.

[41] JM i've changed from 99% ashkenazi jewish to 100% ashkenazi jewish RR korean went away JM korean went away


[43] 23andMe was the only company approved by the FDA to market health-related tests in the United States. P47 (pdf pg 13)

[44] RR email: "Is it true that 23andMe is the only direct-to-consumer genetic testing company that has FDA approval to offer consumers medical health information?" FDA: Yes

[45] Other companies like Helix, Color, Genos, Veritas require a physician or genetic counsellor to be involved.

[46] The premise of Helix is a “sequence-once-query often” model that stores genomic information in a central database and allows business partners to develop DTC testing strategies that interrogate portions of these genomic data sets for its customers. Color Genomics, which focuses on the BRCA gene test, experimented with a test delivery model in which a physician working for Color Genomics would order genetic testing at the request of a consumer, with genetic counseling provided at no extra charge. Similarly, the carrier screening companies Good Start Genetics and Counsyl have their own consumer-facing websites and advertise directly to consumers but work closely with a network of affiliated clinical providers who order their tests. P5 & 6 (pg 117-118 in article)

[47] https://www.23andme.com/dna-health-ancestry/ 

Type 2 Diabetes BRCA1/BRCA2, Age-Related Macular Degeneration, Alpha-1 Antitrypsin Deficiency, Celiac Disease, Familial Hypercholesterolemia, G6PD Deficiency, Hereditary Hemochromatosis (HFE‑Related), Hereditary Thrombophilia, Late-Onset Alzheimer's Disease, Parkinson's Disease.

[48] https://www.23andme.com/dna-health-ancestry/#all-reports-list 

[49] Only 20–25% of the general population carries one or more ɛ4 alleles, where 40–65% of AD patients are ɛ4 carriers. ….The effect of APOE ɛ4 accounts for 27.3% of the estimated disease heritability of 80%.23 The part of the heritability that was yet unaccounted for has been the driving force behind decades of continued search for genetic risk factors.

[50] Presently, the largest meta-analytic genome-wide association study (GWAS) for LOAD employed a two-stage study design. First, 17,008 cases were compared to 37,154 controls. A total of 11,632 single-nucleotide polymorphisms (SNPs) with P <1× 10−3 from this meta-analysis were included in the second stage that compared 8,572 cases to 11,312 controls. P1

[51] Success using the GWAS model depends on genetic risk being determined by shared stretches of DNA carried with different frequencies in cases and controls, inherited from ancient ancestors, termed the “common disease–common variant” hypothesis. Not all disease risk is caused by common variants, however, and thus GWAS will not detect all variants involved.

[52] General population Men Age 65 1%, Age 75 3%, age 85 11%. One copy of ε4 variant

Men, age 65: 1%, age 75: 4-7%, age 85: 20-23% - Gabe’s 23andme report

[53] [Note: 23 and me doesn’t tell you exactly which variants of APOE you have, but Gabe’s results are consistent with having APOE34 (one copy of 3, one copy of 4), as this doubles men’s risk compared to the general population, as per this paper] At the age of 85 the LTR of AD without reference to APOE genotype was 11% in males and 14% in females. At the same age, this risk ranged from 51% for APOE44 male carriers to 60% for APOE44female carriers, and from 23% for APOE34 male carriers to 30% for APOE34 female carriers, consistent with semi-dominant inheritance of a moderately penetrant gene.”  

[54] http://adamrutherford.com/

[55] Only 20–25% of the general population carries one or more ɛ4 alleles, where 40–65% of AD patients are ɛ4 carriers. ….The effect of APOE ɛ4 accounts for 27.3% of the estimated disease heritability of 80%.23 The part of the heritability that was yet unaccounted for has been the driving force behind decades of continued search for genetic risk factors.

[56] Presently, the largest meta-analytic genome-wide association study (GWAS) for LOAD employed a two-stage study design. First, 17,008 cases were compared to 37,154 controls. A total of 11,632 single-nucleotide polymorphisms (SNPs) with P <1× 10−3 from this meta-analysis were included in the second stage that compared 8,572 cases to 11,312 controls. P1

[57] Success using the GWAS model depends on genetic risk being determined by shared stretches of DNA carried with different frequencies in cases and controls, inherited from ancient ancestors, termed the “common disease–common variant” hypothesis. Not all disease risk is caused by common variants, however, and thus GWAS will not detect all variants involved.

[58] At the age of 85 the LTR of AD without reference to APOE genotype was 11% in males and 14% in females. At the same age, this risk ranged from 51% for APOE44 male carriers to 60% for APOE44female carriers.

[59]the penetrance of breast cancer (in Ashkenazi Jewish women) at age 70 among BRCA1 mutation carriers is estimated to be 46% (95% confidence, 31%–80%) rising to 59% (95% confidence, 40%–93%) at age 80. For BRCA2, the relative risks in the same three age categories were estimated to be 3.3, 3.3, and 4.6, respectively, resulting in a penetrance at age 70 of 26% (95% confidence, 14%– 50%) rising to 38% (95% confidence, 20%–68%) at age 80.

[60] The APOE ɛ4 allele increases risk in familial and sporadic early-onset and late-onset AD, but it is not sufficient to cause disease.18,19,20 The risk effect is estimated to be threefold for heterozygous carriers (APOE ɛ34) and 15-fold for ɛ4 homozygous carriers (APOE ɛ44), and has a dose-dependent effect on onset age.18,19 https://www.nature.com/articles/gim2015117 

[61] Taking as a basis the most frequent genotype (APOE 33), the [odds ratios] are estimated to be 3.2 for APOE34 and 14.9 for APOE 44 … https://www.nature.com/articles/mp201152

[62] Disclaimers BY 23andme:

[63] At the age of 85 the LTR of AD without reference to APOE genotype was 11% in males and 14% in females. At the same age, this risk ranged from 51% for APOE44 male carriers to 60% for APOE44 female carriers, and from 23% for APOE34 male carriers to 30% for APOE34 female carriers https://www.nature.com/articles/mp201152 

[64] Confirmed over email: one copy of APOE4

[65] Only 20–25% of the general population carries one or more ɛ4 alleles, where 40–65% of AD patients are ɛ4 carriers

[66] We confirm 20 previous LOAD (Late Onset AD) risk loci and identify five new genome-wide loci. 

[67] More than 1,000 variants in the BRCA1 and BRCA2 genes are known to increase cancer risk. Our report focuses on three that are among the most studied and best understood. These three variants are most common in people of Ashkenazi Jewish descent and are much less common in people of other ethnicities. https://www.23andme.com/brca/ 

[68]  The cohort was composed specifically of Ashkenazi Jewish patients, be- cause this population harbors three ancient BRCA1 and BRCA2 mutant alleles with a combined population frequency of 2.5%. Variants tested:  BRCAI.185delAG,  BRCAI.5382ins,  BRCA2.6174delT

[69] Patients of Ashkenazi Jewish (AJ) descent have a 2.5% risk of carrying 1 of 3 BRCA1/2 founder mutations (BRCA1 185delAG [c.68_69delAG], BRCA1 5382insC [c.5266dupC], or BRCA2 6174delT [c.5946delT]).

[70] https://brcaexchange.org/variants?pageLength=100&orderBy=Pathogenicity_expert&order=descending

BRCA Exchange database of all variants, ordered by clinical significance

[71] Individuals of Jewish descent have an increased chance of carrying a BRCA mutation at a population frequency of 1/40 (2.5%) vs. 1/500 (0.2%) in the western European population. p4

[72] For breast cancer, if healthy lifestyle choices were preferentially targeted to and employed by women in the top decile of genetic risk, an estimated ~20% of all preventable breast cancer cases would be avoided. p7

[73] The models estimated combinations of several baseline factors including age, smoking status, hypertension, night vision score, and Age-Related Eye Disease Study (AREDS) simple severity scale score to predict GA.

[74] Across four studies involving 55,685 participants, genetic and lifestyle factors were independently associated with susceptibility to coronary artery disease. Among participants at high genetic risk, a favorable lifestyle was associated with a nearly 50% lower relative risk of coronary artery disease than was an unfavorable lifestyle.

[75] First, adequate evidence is present for vascular risk factors, including midlife hypertension, diabetes mellitus, smoking, midlife obesity, stroke and cardiovascular disease (Table 1) [25]. Some of these factors increase dementia risk when present in midlife, emphasizing the importance of applying a life-course perspective when examining risk factors and implementing preventive interventions [26-30]. Second, protective nutritional components include omega-3 fatty acids and unsaturated fats, antioxidants, vitamins and moderate alcohol consumption [31-35]. The importance of dietary patterns (e.g. Mediterranean diet) is also recognized, as nutritional components interact to produce synergistic effects (for a review see [36]). Third, lifestyle and psychosocial factors can also modify dementia/AD risk. For example, while living alone, having feelings of loneliness, depression, social isolation and psychosocial stress can increase dementia/AD risk, higher levels of education, engaging in exercise, and cognitively and socially stimulating activities are protective.

[76] More than 1,000 variants in the BRCA1 and BRCA2 genes are known to increase cancer risk. Our report focuses on three that are among the most studied and best understood. These three variants are most common in people of Ashkenazi Jewish descent and are much less common in people of other ethnicities. https://www.23andme.com/brca/ 

[77] 23andMe tests for Alpha-1 Antitrypsin Deficiency (select “See all reports” under Health Predisposition reports)

[78] https://ghr.nlm.nih.gov/condition/alpha-1-antitrypsin-deficiency People with alpha-1 antitrypsin deficiency usually develop the first signs and symptoms of lung disease between ages 20 and 50. … Affected individuals often develop emphysema, which is a lung disease caused by damage to the small air sacs in the lungs (alveoli)

[79] People varied widely in how much control they wanted over the use of data. They were more concerned about use by employers, insurers, and the government than they were about researchers and commercial entities. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0204417 

[80]  survey data suggest that a significant number of people fear that they will suffer from genetic discrimination if they allow their genetic material to be sampled and analyzed http://sci-hub.tw/https://link.springer.com/article/10.1007/s11606-012-1988-6 

[81] https://www.forbes.com/sites/matthewherper/2018/07/25/23andme-gets-300-million-boost-from-glaxo-to-develop-new-drugs/#1a57b4af3213

[82] https://www.23andme.com/transparency-report/ (government requests)


PRODUCED 0  Ancestry received 10 valid law enforcement requests for user information in 2018. We provided information in response to 7 of those 10 requests. All the 2018 requests relate to investigations involving credit card misuse, fraud, and identity theft. We refused numerous inquiries on the basis that the requestor failed to obtain the appropriate legal process. We received no valid requests for information related to genetic information of any Ancestry member, and we did not disclose any such information to law enforcement.

[83] We may share your Personal Information if we believe it is reasonably necessary to:

Comply with valid legal process (e.g., subpoenas, warrants);

Enforce or apply the Ancestry Terms and Conditions;

Protect the security or integrity of the Services; or

Protect the rights, property, or safety, of Ancestry, our employees or users.

If we are compelled to disclose your Personal Information to law enforcement, we will do our best to provide you with advance notice, unless we are prohibited under the law from doing so. In the interest of transparency, Ancestry produces a Transparency Report where we list the number of valid law enforcement requests for user data across all our sites. https://www.ancestry.com/cs/legal/privacystatement

[84] Under certain circumstances Personal Information may be subject to disclosure pursuant to judicial or other government subpoenas, warrants, or orders, or in coordination with regulatory authorities. However, we use all practical legal and administrative resources to resist such requests. In the event we are required by law to make a disclosure, we will notify you in advance, unless doing so would violate the law or a court order. https://www.23andme.com/transparency-report/ 

[85] http://sci-hub.tw/https://doi.org/10.1002/hast.847

[86] GINA prohibits only the discriminatory use of genetic information by employers and health insurance companies, and the vast majority of DTC-GT companies do not qualify as “covered entities” under HIPAA (Health Insurance Portability and Accountability Act of 1996). p6

[87] a legal paper that analysed the privacy documents of many direct to consumer genetic testing outfits in the United States… found that most genetic testing companies that mentioned GINA…  “generally provided warnings of its uncertain scope”

[88] https://www.reuters.com/article/us-verdict-dna-defecator/georgia-workers-win-2-2-million-in-devious-defecator-case-idUSKBN0P31TP20150623

[89] https://www.leagle.com/decision/inadvfdco160219000191

[90] https://www.nature.com/news/why-the-devious-defecator-case-is-a-landmark-for-us-genetic-privacy-law-1.17857

[91] In 2012, Atlas Logistics, a transportation and storage servicer for the grocery industry, began an internal investigation after numerous instances of human feces were found in one of its warehouse facilities.6 Atlas’s Loss Prevention Manager narrowed down the list of possible suspects by “comparing employee work schedules to the timing and location” of the feces. p5

[92] Within weeks of collecting DNA samples, the forensic laboratory determined that neither Plaintiff matched the fecal sample tested. p5

[93] The case, nicknamed the 'mystery of the devious defecator' by US district court judge Amy Totenberg, is the first brought under GINA to go to trial.

[94] The Genetic Information Nondiscrimination Act (GINA) was intended to protect individuals in the USA from discrimination based on their genetic data, but does not apply to life, long-term care or disability insurance.

[95] But GINA has serious limitations, including its lack of application to life insurance and long-term care insurance and to employers of fewer than 15 employees. In addition, GINA places the burden on victims of genetic discrimination to prove that their information was misused — which is as plausible an explanation for the dearth of successful GINA cases as the possibility that the law has effectively discouraged such discrimination. This flawed mechanism, though well-intentioned, is hardly adequate to balance complex competing interests that might arise in DNA testing http://sci-hub.tw/10.1056/NEJMp1805870 

[96] Dr. Ian Malcolm: Don't you see the danger, John, inherent in what you're doing here? Genetic power is the most awesome force the planet's ever seen, but you wield it like a kid that's found his dad's gun. IMDB