Democratizing Genetic Analysis

Within the last two decades, the world of medicine has witnessed remarkable advancements in the field of genetic health testing, changing our understanding of individual health. One of the most significant breakthroughs in this area is the study of Single Nucleotide Polymorphisms (SNPs). These tiny variations in our DNA hold valuable insights into our genetic makeup and play a crucial role in personalized medicine.

The scientific study of SNPs is largely due to the successful completion of the Human Genome Project (HGP), an extremely ambitious international effort by scientists at twenty universities and research institutions across the world to sequence the entirety of all genetic content of the human organism's chromosomes.

SNPs occur when a single nucleotide within the genome differs from what is considered the standard reference sequence. At their core, SNPs are single base pair substitutions that occur when one nucleotide, such as adenine (A), is replaced by another, like thymine (T), cytosine (C), or guanine (G). These seemingly minor alterations hold great significance as they play a pivotal role in shaping our individual traits, health predispositions, and drug responses.

At one point, genetic health testing, particularly SNP analysis, was once a long and expensive process. However, in recent years, ancestry and genealogy tests utilizing SNP analysis have become common. Usually, these tests only display exclusively genealogy-related data. Despite this, once you take a test, you are entitled to receive a copy of what is known as "raw data". Raw data is essentially an extremely long text file that looks like this:

rs4477212	1	82154	AA
rs3094315	1	752566	AA
rs3131972	1	752721	AG
rs12124819	1	776546	AA
rs11240777	1	798959	AA
rs6681049	1	838555	AA
rs4970383	1	846808	AA

I had taken one of these tests in 2017, less for the ancestry data, which was still interesting, but by no means could compare to the data I would receive through the raw data.

One of the most popular ways of analyzing raw data generated through these tests is a tool called Promethease. For $12, Promethease would analyze your raw data file and provide you with a huge report on it. Naturally, I uploaded my raw data and was shortly presented with an extensive report of my genetic predispositions to disease, my potential resistence and sensitivity to certain medications, my personality traits, and more. The amount of information was honestly staggering considering it all came from some spit in a tube.

However, I had a problem. Sometime after Promethease was acquired by MyHeritage in 2019, the data in my report completely vanished. To this day, I have no clue what happened and have chalked it up to a server migration error.

Oh well. But I still really wanted access to my report. Rather than spend another $12 on one, I decided to build my own.

I began by looking at my raw data file. It consisted of nearly 700,000 lines, each containing an SNP, way more than I could analyze manually. Luckily, I'm pretty good at tricking sand into thinking using nothing other than a keyboard and lots of Google.

Firstly, I needed to find a way to convert the data into something easily readable by a computer. Luckily, genome.js had already made a cool package that processes raw data files and converts them into JSON, So I used that.

Around this time, I realized that I could make this more than just a one-off analysis of my raw data and create a tool free of use, that provides the same SNP analysis that Promethease does for $12.

Originally, I was running genome.js's JSON package in my terminal, but I soon implemented it into a simple file upload form, and voila! I had a way to upload my raw data file and convert it into JSON. Now I just needed to figure out what to do with it.

To make the JSON data accessible to whatever process I was going to use to analyze it, it had to get stored somewhere. Now, this was a little bit of a problem since I didn't exactly want to be uploading people's genetic data to a server and become responsible for it, so I decided to use IndexedDB. IndexedDB is a browser-based database that allows you to store large quantities of data locally. It's not the most elegant solution, but it works. So I implemented that, and now I had a way to store the data.

Next, I needed to figure out how to analyze the data. I should probably mention at this point that Promethease was created by a geneticist, and I am not a geneticist. I'm a software engineer and a community college student. Anyways, I had no idea how to analyze the data, so I did what any good software engineer would do: I Googled it. Turns out Promethease relies on a tool called SNPedia to analyze the data. SNPedia is a wiki that contains a massive amount of information about SNPs and their effects on humans. It's a great resource, so I decided to use this as well.

Now that I had a way to analyze the data, I needed to figure out how to display it. I decided to use Next.js, a React framework, to build the frontend. I should also point out that this was the first major project I had ever built on my own using either React or Next.js, so I was learning as I went. I built a simple web application with three pages. One page for uploading the raw data file and showing all of the scary warnings and disclaimers, one page for viewing the report, and one page for viewing a sample report consisting of random data. I also added a way to download the report as a PDF in a cool formatted document, because why not?

Sparing a lot of the weird bugs I came across and the insane amount of testing I had to do, that was pretty much it. I completely rebuilt a service that I had already paid for because I didn't want to pay for it again. I'm not sure if that makes me a genius or an idiot, but I'm leaning towards the latter. Anyways, the story doesn't end there (unfortunately). I decided to share my creation with the world, so I posted it on Product Hunt and a few other places. It got a decent amount of attention, and I was pretty happy with it. But then, a few days later, someone named Mike followed me on Twitter. Mike had written "formerly http://promethease.com" and "formerly http://SNPedia.com" in his bio. I thought that was pretty cool, but it turns out that Mike is actually the co-founder of both platforms. Given the fact that I had just rebuilt his entire business, I'm not sure if I should be expecting a cease and desist letter or a job offer. I guess we'll find out.

screenshot

If you want to check out the project, you can find it here. If you want to check out the source code, you can find it here. Thanks for reading!