...

How ChatGPT solved an Ancestry DNA mystery for me and my long-lost cousin


How I used ChatGPT to decode an Ancestry DNA mystery

ZDNET

In 2017, I sent DNA samples to Ancestry, as well as to two other DNA companies. My parents had recently passed away, and I had some questions about my family background that I hoped the DNA might reveal.

As it turned out, that DNA reveal sparked a fairly long and painful story, which you can read here:

Ever since then, I’ve kind of dabbled with my family tree. I enjoy digging through documents and connections, following clues, and updating charts.

But then, a few weeks ago, I was contacted by one of my DNA matches. It was an odd sort of connection.

Based on the DNA data, I knew exactly how related we were (roughly third cousins), with about 1% shared DNA. But I didn’t (and still don’t) know the person’s gender or name. The contact used an Ancestry username, which didn’t indicate either gender or first name. I also know that the person’s approximate age is close to mine, because they told me their age in the message.

And then things started to get interesting. My cousin (for I know the person is my cousin, even if I don’t know their name) asked ChatGPT to provide insights into our possible relationship based on the DNA data. That included average lifespans and birth and death periods of our shared ancestors.

I asked this mystery cousin’s permission to tell you about their ChatGPT use, which they granted. Based on the transcript of their session, along with some of my own questions, ChatGPT was able to shed some light on the family connection.

In this article, I’m going to show you how I used ChatGPT (and, by extension, how you can use it) to explore genealogy connections between DNA relatives. I’ll show you the prompts, but in most cases, I’ll just summarize the responses, because those can get quite long.

How are we related?

My starting point was the DNA data itself. According to Ancestry:

  • Shared DNA: 95 cM across 10 segments on my maternal side
  • Unweighted shared DNA: 95 cM
  • Longest segment: 16 cM

Ancestry predicted that we were “half 2nd cousin 1x removed,” but the shared DNA quantity doesn’t necessarily place the relationship on a family tree. It just tells you how many jumps away one person is from the other. So those jumps can go equally all the way up and down the tree, or partially up on one side and down an extra generation on the other, or some variety of the two.

I started asking ChatGPT about the DNA data. I asked:

What does this mean? Shared DNA: 95 cM across 10 segments Unweighted shared DNA: 95 cM Longest segment 16 cM

Also: I spent hours testing ChatGPT Tasks – and its refusal to follow directions was mildly terrifying

I was told that cM is a unit of measurement for genetic linkage. It measures the length of DNA shared between two individuals. The 95 value indicates second cousins or greater. DNA is shared in blocks or segments. The more segments, the closer the relationship. Larger segments indicate closer relationships, while smaller segments indicate more distant relationships.

Our shared DNA had few shared segments, and those segments were pretty small. All together, that put us about eight generational hops from each other.

What kind of cousins?

I knew my cousin and I are about the same age, so I asked:

If both parties are of similar ages, would they be more likely third cousins or second cousins once removed?

In this case, we would more likely be third cousins. The phrase “x removed” indicates a difference in generations. Since we are both about the same age, our generational label would not include “removed.” Instead, we’d be more likely third cousins.

Draw me a diagram

I had trouble visualizing this, so I asked ChatGPT to give me a diagram. My first prompt was, “I would like a visualization of this. Please use DALL·E.” I got back whatever this is supposed to be.

image

Screenshot by David Gewirtz/ZDNET

Then I tried, “Please create a visualization using a diagram rather than a picture.” I got back a diagram that listed “great-great-grandparent” at every node.

So, I corrected ChatGPT with, “That diagram does not seem right. You have labeled great-grandparents on every node.” That resulted in this diagram, which makes the relationship to my cousin fairly clear, if it’s right. I did look elsewhere for corroboration, and it seems correct.

corrected-family-tree-third-cousins-connection.png

Screenshot by David Gewirtz/ZDNET

So, now I could see that our families connected via my grandparent’s grandparent. That makes it difficult for us to see family links because I’ve only tentatively identified one great-great-grandparent in my entire tree.

How many grandparents?

That led me to another question: How many possible grandparents are there in the ancestral pool that my cousin and I share? Here’s what I asked ChatGPT:

At the third cousin level, how large is the pool of great-grandparents?

The AI responded that at the third cousin level, each of us has a pool of 16 great-great-grandparents. We share one pair of great-great-grandparents, which means each of us also has 15 great-great-grandparents that are unique to each of us.

Also: The best AI for coding (and what not to use)

I have only identified one great-great-grandparent in my entire tree. I’ve had difficulty confirming who my great-grandparents are (apparently “Poppy,” which is the only way my mother ever referred to her grandfather, isn’t a good search term). This makes it fairly long odds that the person I’ve identified (or may have identified because the data is shaky) is the shared great-great-grandparent.

Generational questions

In a short conversation via Ancestry’s messaging interface, my cousin described ChatGPT as “my new best friend.” They used ChatGPT to try to find out when our mutual ancestor might have lived. Because my cousin said that “our shared ancestor likely would have lived in Russia,” I’m guessing we’re working with my maternal grandmother’s tree, since her family came from Russia.

Armed with the above information, I slightly modified my cousin’s prompt and fed the following to ChatGPT:

I am trying to identify the possible birth and death years of a shared ancestor. My cousin shares 1% of my DNA and we previously determined we’re most probably third cousins. We are also of similar ages, born in the 1960s.

I know my maternal grandmother’s parents came from Ravna, which is about halfway between Moscow and St. Petersburg in Russia.

My maternal grandmother’s father arrived in America in 1902 at about 21 years of age. His wife arrived in either 1898 or 1900 (depending on which source you believe), but they got married in 1905. She was 28 when they got married. He was 24.

My cousins family arrived around 1880. Based on average lifetimes in the ancestor’s era and country of origin, what would the ancestor’s likely birth and death years be?

Also: The best AI chatbots

The AI broke the answer up into four elements: identifying the likely generation of the shared ancestor, determining birth years, estimating death years, and cross-referencing with migration data. In the first run, ChatGPT estimated our shared ancestors were born between 1847 and 1861 and died between 1870 and 1921.

ChatGPT then asked, “Would you like me to refine this further with additional historical context or explore other aspects of this estimate?” to which I replied, “Yes.”

It took another look at the family timelines, factoring in migration details. From that, it narrowed the range of birth years to 1835-1861 and death years to 1870-1880.

Then it asked, “Would you like additional insights, such as potential cultural or regional factors that could further narrow this range?” In this case, I answered, “Both families were Jewish.”

ChatGPT correctly recognized this detail might change the estimates, because “Jewish families in 19th-century Russia experienced unique demographic, cultural, and migratory patterns.” Life wasn’t easy for our ancestors back then, with pogroms, forced residency in ethnic ghettos, and the unique community structure of Russian Jews back in the late 1800s.

From this, ChatGPT determined:

  • Birth year range: ~1820–1840 (depending on generational timing).
  • Death year range: ~1870–1900 (possibly closer to ~1880, if they passed before or during the emigration of their children).

If you’d like to see the entire ChatGPT session, feel free to click this link.

The DNA connection

I find some of this oddly fascinating. The human body contains roughly 200-250 grams of DNA, which is roughly the weight of a medium-sized apple. The amount of DNA my cousin and I share is about 1% of that, or about the weight of a small paperclip.

That “paperclip” is made from sugar and phosphate groups, encoded with Adenine and Thymine pairs using two hydrogen bonds, and Cytosine and Guanine pairs using three hydrogen bonds. Each of these four molecules contains nitrogen atoms.

From that, we’re able to find out that a person I’ve never met and I share a paperclip’s worth of code, which identifies us as descendants of two people who lived in Russia at the same time as America was having its Civil War.

Also: ChatGPT vs. ChatGPT Plus: Is a paid subscription still worth it?

We don’t know those two people. We don’t know their stories. We don’t know their names. Yet, we exist because something brought those two ancestors together, and a series of improbable and unknowable events throughout the last 150 years led to two strangers being born on the opposite side of the globe from where our great-great-grandparents lived.

We don’t speak the language they spoke, and the planet we live on is vastly different from the one they lived on. And yet, we are here — and you are reading this — solely because of them.

Do you have an interesting DNA story? Have you tried ChatGPT as a tool for researching your heritage? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.



Source link

#ChatGPT #solved #Ancestry #DNA #mystery #longlost #cousin