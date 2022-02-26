Scientists at the Big Data Institute at the University of Oxford in the UK have taken a big step towards tracking the entirety of genetic relationships between humans. They managed to map the largest family tree known, in a study published this Friday (25), in the scientific journal science.

In the last two decades, extraordinary advances have been made in human genetic research, making it possible to identify and store the genomic data of hundreds of thousands of individuals, including prehistoric people.

Visualizing human ancestral lineages inferred across time and space. Each row represents an ancestor-descendant relationship in our genealogy inferred from modern and ancient genomes. The width of the lines corresponds to how many times the relationship is observed, and the colors are determined based on the estimated age of the ancestor. Image: Yan Wong and team via Science

This means a strong hope of tracing the origins of human genetic diversity to produce a complete map of how individuals across the world are related to one another.

Working out a way to combine genome sequences from many different databases and developing algorithms to handle so much information seemed like two big challenges. However, this new approach was able to easily combine data from multiple sources and scales to accommodate millions of genome sequences.

Yan Wong, an evolutionary geneticist for Big Data, is one of the lead authors of the study. “We basically built a huge family tree, a genealogy for all of humanity, that models exactly the story that generated all the genetic variation that we find in humans today,” he says. “This genealogy allows us to see how each person’s genetic sequence relates to all others, along all points in the genome.”

Since individual genomic regions are inherited from only one parent, either the mother or the father, the ancestry of each point in the genome can be thought of as a tree. The set of trees, known as a “tree sequence” or “ancestral recombination graph,” links genetic regions back in time to ancestors where genetic variation first appeared.

“Essentially, we are reconstructing the genomes of our ancestors and using them to form a vast network of relationships. We can then estimate when and where these ancestors lived,” said Anthony Wilder Wohns, lead author, who did the research for his PhD in Big Data and is now a postdoctoral researcher at Harvard. “The power of our approach is that it makes very few assumptions about the underlying data and can also include both modern and ancient DNA samples.”

The study integrated data on modern and ancient human genomes from eight different databases and included a total of 3,609 individual genome sequences from 215 populations.

Largest family tree in history goes back over 100,000 years in time

Among the ancient genomes were samples found around the world ranging in age from 1,000 to over 100,000 years. The algorithms predicted where common ancestors must be present in evolutionary trees to explain patterns of genetic variation. The resulting network contained nearly 27 million ancestors.

After adding location data from these sample genomes, the authors used the network to estimate where predicted common ancestors had lived. The results successfully recaptured key events in human evolutionary history, including migration out of Africa.

While the genealogical map is already an extremely rich resource, the research team plans to make it even more comprehensive by continuing to incorporate genetic data as it becomes available.

As tree sequences store information highly efficiently, the database could easily accommodate millions of additional genomes. “This study is laying the groundwork for the next generation of DNA sequencing. As the quality of genome sequences from modern and ancient DNA samples improves, the trees will become even more accurate and eventually we will be able to generate a single unified map that explains the descent of all the human genetic variation we see today.” believes Wong.

