So, this oneās likely pretty niche, but Iām hoping someone here might know the answer.
So, Iāve gotten genotype data for myself from 23AndMe (donāt worry, I made them delete it before the acquisition) and AncestryDNA years ago and Iāve been looking into things like SNPs and such more recently. I write code for a living, so I can do some cool things with a little code and the raw data that Iāve gotten to check into what interesting SNPs I might have.
Something Iāve noticed recently is that for some SNPs, Iāve got alleles that arenāt listed as a possibility anywhere on the internet that I can find.
Just to take a random example, rs3746544, part of the SNAP25 gene. According to SNPedia, the available alleles are A and C with A being the major allele and C being the minor. So what is my genotype for that SNP?
[tootsweet@computer genome_raw_data]$ grep rs3746544 23andme_raw_data.txt ancestrydna_raw_data.txt
23andme_raw_data.txt:rs3746544 20 10287084 TT
ancestrydna_raw_data.txt:rs3746544 20 10287084 T T
[tootsweet@computer genome_raw_data]$
TT? Thereās zero mention of āTā being an allele that you can have for rs3746544.
rs3746544 is very much not the only example. Just a few more among many:
- SNPedia says rs807701 has alleles C and T, but I have AA.
- SNPedia says rs25532 has alleles C and T, but I have AG.
- SNPedia says rs6265 has alleles A and G, but I have TT.
Iām hoping some of you folks know enough about genes to know what might be up with these examples. Iām sure itās just simply something I donāt yet understand about genetics. Thanks in advance!
Edit: So I had a bit of a brain fart after writing this in a comment:
(Side note: oddly of the 23 āmismatchā examples I mentioned, my genotype doesnāt have a single allele in common with the documented possible alleles for the SNP. For example, I donāt have any ATās where the documented alleles are AA, AC, and CC. My genes either match the documented alleles or have no alleles in common with the documented genotypes. Which seems even stranger.)
Aās match with Tās and Cās with Gās. Iām guessing when I get a āmismatchā like what Iām talking about, what 23andme or AncestryDNA is giving me is the complementary base pairs. So if I see a CT where the documented options are AA, AG, and GG, I should just consider my CT to be equivalent to an AG. (Because the T matches up with an A and the C matches up with a G.)
So I guess that means that sometimes the equiment that 23andme and AncestryDNA use reads the other side of the DNA strand from the one thatās documented in the literature. (This only seems to happen in about 16.5% of cases or therebouts ā at least thatās what my napkin math indicates. In most cases, what 23andme and AncestryDNA report in the raw data matches and thus must be measuring/reading/reporting the āsame sideā of the double helix as the literature talks about.)
At least that theory seems consistent with what Iām seeing. If anybody knows better, I definitely would appreciate any further input!
That said, it does seem kindof odd that any time 23andme reads the āother sideā of the DNA molecule, so does AncestryDNA and vice versa. That is, there donāt seem to be any cases where they disagree on my genotype for a given SNP. At least I havenāt seen any examples of that so far. I might have to do some searching now.
Edit 2: Iāve done a little more googling based on the first edit above and found this page. It seems 23andme always goes off of the so-called ā+ strandā of the āGenome Reference Consortium Human Build 37ā human reference genome. So maybe the 23 examples Iāve found so far are cases where at least some of the literature (or at least SNPedia and EUPedia, if not āthe literatureā) is based more off of what the āGenome Reference Consortium Human Build 37ā considers the ā- strandā. So maybe āthe literatureā (and/or SNPedia/EUPedia) uses a different reference genome? All this is still just a theory, but I definitely know more than I did a few minutes ago.
Edit 3: Some folks are suggesting that 23AndMe and AncestryDNA may just not be accurate. As in, 23AndMe and AncestryDNA may have a very high error rate when reading my genetic data. If that was the case, I wouldnāt expect the inaccuracies to āmatchā between the two raw data files. So, to test that hypothesis out, I wrote a script to check my 23AndMe raw data against my AncestryDNA data to see how often they disagree. The script is quite slow, but at the moment itās checked over 35,000 SNPs that are measured by both services and found 12 that disagree for an error rate of roughly 0.0343%. From another comment, I mentioned the instances Iāve found make up about 16.5% of the ones Iāve checked. So it doesnāt seem like that accounts for a very large percentage of these. Iām still leaning pretty heavily toward it just being the āother strandā theory. Thanks again for everyoneās input!


This means you basically have more errors. Do they tell you how many copies of each sequence they sequenced? If itās less than like⦠10 then Iād that with a grain of salt.
Complements might mean youāre looking at the wrong strand.
Maybe itās just not documented.
Your body has tons of spontaneous mutations, so could just be one of those. Most are harmless.