Find and Fix Family Tree Errors Automatically

Every family tree of any real size contains errors: birth dates after death dates, parents younger than their children, the same ancestor entered twice with slightly different spellings, marriages without a child link, individuals who aren't connected to anyone. Manually hunting these down in genealogy software is slow, frustrating, and easy to miss — especially in a tree you've inherited or one assembled from many merges.

GEDminer's family tree error checker scans your entire GEDCOM file in seconds and produces a categorised report of every issue, with the exact record reference so you can fix it in your main software. There is no server, no signup, and no upload — your file is parsed in the browser and the report is yours to keep.

Below is the full taxonomy of error types it detects, why each one matters for the integrity of your research, how the underlying logic works, and which guide to follow to clean each category up. The long-tail FAQ at the bottom answers the questions researchers most often ask before running their first scan.

Upload GEDCOM file See all features 100% in-browser · Free · No signup

How GEDminer solves it

Impossible dates: deaths before births, future events, ages over 120.

Logical date validation flags every contradiction with a one-click jump to the offending record and the underlying GEDCOM line.

Date Consistency Checker →

Duplicate individuals merged from multiple sources.

Soundex + Levenshtein matching with a tunable confidence threshold finds duplicates other tools miss, including spelling variants and abbreviated names.

Duplicate Finder →

People with no parents, no children, or no family link at all.

The Unlinked Individuals report lists every floating record so you can connect or remove them.

Unlinked Individuals →

Vague vital records like "abt 1850" or "Yorkshire".

The Vital Sharpener prioritises imprecise dates and places by impact so you fix the high-value ones first.

Vital Sharpener →

Inconsistent character encoding showing as "Müller" → "Müller".

A multi-pass decoder (UTF-8, Windows-1252, Mac Roman, Latin-1) recovers garbled accented names automatically.

Encoding Recovery →

Parents younger than their children or impossibly old at marriage.

Cross-fact age validation flags biologically implausible parent-child age gaps and marriage ages outside the historical norm.

Relationship Age Checks →

The seven categories of family tree error

Most family tree errors fall into one of seven categories. GEDminer detects all of them:

  1. Date contradictions — death before birth, child born after father's death (more than 9 months), marriage before either partner's birth, future events, ages over 120.
  2. Duplicate individuals — the same ancestor entered more than once because of a merge, an import, or two contributors working independently.
  3. Broken structural links — a family points to a spouse or child ID that doesn't exist; an individual claims a family that doesn't reference them back.
  4. Unlinked individuals — people in the file with no parents, no children, no spouses, and no events tying them to anyone else.
  5. Encoding corruption — non-ASCII characters mangled by an old export pipeline (UTF-8 read as Windows-1252 is the classic case).
  6. Place inconsistencies — the same location written multiple ways, frustrating searches and mapping.
  7. Sourcing gaps — facts asserted with no citation, or citations that point to an empty source record.

Each of these has its own report in the analyzer with copy-and-paste record IDs so the fix can happen in your main editor with no guesswork.

How the date and age checker actually works

The date checker normalises every GEDCOM date into a canonical form (year, month, day, plus precision flags for "abt", "bef", "aft", date ranges and approximate dates). It then runs a series of cross-fact comparisons:

  • Self-consistency: birth ≤ death; death ≤ today; ages within plausible bounds.
  • Marriage logic: both partners born before the marriage; both alive at the time; neither already married to the same partner.
  • Parent-child logic: father at least ~14 years older than child and alive at conception; mother at least ~12 years older and alive at birth (with sensible upper bounds).
  • Sibling spacing: large gaps between recorded siblings are surfaced separately in the Incomplete Families report rather than flagged as errors.

Approximate dates ("abt 1850") are not flagged as errors — they're flagged as opportunities by the Vital Sharpener, ranked by how much your tree would improve if they were tightened.

Why duplicate detection is harder than it looks

Naive duplicate detection ("two people with the same name and birth year") misses most real duplicates and produces too many false positives. GEDminer combines several signals:

  • Soundex phonetic codes so "McDonald" matches "Macdonald" and "Stewart" matches "Stuart".
  • Levenshtein edit distance on full names so "Mary Eliz Smith" matches "Mary Elizabeth Smith".
  • Vital-date overlap with tolerance for approximate dates ("abt 1850" matches "1849–1851").
  • Parent overlap as a strong corroborating signal (or a strong dis-corroborating one — different parents = probably not a duplicate).
  • Place agreement as a secondary signal.

Each pair gets a single normalised confidence score. Pairs above the 0.55 threshold are shown ranked by score, with the supporting evidence visible so you can make the call. The threshold is tunable in settings.

Encoding corruption: the silent error

Encoding corruption is the most common error in old GEDCOM files and the easiest to miss because it's invisible if you only look at one record at a time. The classic symptom: "Müller" appears as "Müller", "O'Brien" appears as "O’Brien", or accented vowels appear as random punctuation.

This happens when a file is written in one encoding (typically UTF-8) and read in another (typically Windows-1252 or Latin-1). GEDminer's parser tries each encoding in turn and scores the result by how many recognisable words appear, picking the cleanest decoding automatically. If your file has been corrupted at multiple points (re-saved through a different encoding twice), the parser will surface what it can and flag the rest.

Once the parse is clean, the encoding fix is permanent — you re-export from your editor and the corruption is gone.

Reading the error report and prioritising fixes

The error report is structured by severity and by effort. High-severity items (broken links, impossible dates) come first because they actively break the tree. Medium-severity items (duplicates, encoding) come next because they distort searches and analysis. Low-severity items (vague dates, missing sources) are surfaced separately by the Vital Sharpener as improvement opportunities rather than outright errors.

A practical fix order:

  1. Fix broken structural links first — they cascade.
  2. Resolve duplicates next — fixing other things in a duplicate is wasted effort.
  3. Fix encoding — re-export once and many downstream issues vanish.
  4. Resolve impossible dates record by record.
  5. Use the Vital Sharpener to tighten the highest-impact vague facts.
  6. Use the Sourcing report to add citations to the most-referenced unsourced facts.

Re-export and re-run the analyzer after each pass — the Tree Health Score will tell you whether the work moved the needle.

Step-by-step guides

How to Find and Fix GEDCOM Errors

A practical guide to detecting impossible dates, missing records, duplicate entries, and other data quality issues in your GEDCOM family tree file.

Finding and Merging Duplicate Individuals

Find potential duplicate individuals in your tree using smart matching, compare their records side-by-side, and learn best practices for merging them.

10 Common Genealogy Mistakes and How to Avoid Them

Even experienced researchers make these mistakes. Learn the 10 most common genealogy errors — from unsourced facts to name assumptions — and how GEDminer helps you catch them.

Finding and Connecting Unlinked Individuals

Discover individuals in your tree who aren\'t connected to any family. Learn why they\'re isolated and strategies for reconnecting them.

Using the Vital Sharpener to Improve Date Precision

The Vital Sharpener helps you identify estimated, incomplete, or missing vital records and prioritise which ones to research first for maximum impact.

How to Recover a Corrupted or Broken GEDCOM File

A GEDCOM that won\u2019t load can feel catastrophic. Most "corrupted" files are actually salvageable with a few targeted fixes. Here is the diagnostic workflow.

Fix Garbled Names in GEDCOM Files: Character Encoding Guide

If accented characters, apostrophes, or non-Latin scripts look broken in your family tree, the problem is almost always character encoding. Here is how to diagnose and fix it.

Understanding Your Tree Health and Data Quality Score

Learn how GEDminer evaluates your tree\'s data quality, what the health score means, and practical steps to improve your tree\'s completeness and accuracy.

Frequently asked questions

What kind of family tree errors can GEDminer detect?

Impossible dates (death before birth, future events, ages over 120), duplicate individuals, broken parent-child links, unlinked records, missing vital facts, encoding corruption, inconsistent place names, parent-child age contradictions, and sourcing gaps — among others.

Will GEDminer change my GEDCOM file?

No. The error checker is strictly read-only. It tells you what is wrong and where; you fix it in whatever genealogy software you use, then re-export and re-analyse to confirm.

How accurate is the duplicate detection?

Duplicate matching uses a combined phonetic + fuzzy-string score against names, vital dates and parent overlap, and only flags pairs above a 0.55 confidence threshold. False positives are rare, but every match is shown with its score so you can decide.

Can I see the GEDCOM line numbers for each error?

Yes. Each error references the individual ID and the relevant fact, so locating it in your editor or genealogy program is straightforward.

Should I fix everything the checker reports?

Not necessarily. Some flagged "errors" may be intentional (e.g. estimated dates marked "abt"). Treat the report as a prioritised review list, not an autocorrect.

Why does the same ancestor appear several times in my GEDCOM?

Almost always because two GEDCOMs were merged without de-duplication, or a contributor entered the same person twice with slightly different name spellings. The Duplicate Finder is built specifically to detect both cases.

My ancestor names contain garbled characters — can the error checker fix this?

Yes. The multi-pass encoding decoder will usually recover the original characters automatically when the file is parsed. If the decoder cannot resolve the encoding, you will see a warning explaining the likely cause and the recommended re-export setting.

How do I export the error report?

Every error category can be exported as CSV or XLSX. The export includes the individual ID, the error type, the offending fact and a short description so you can work through the list outside the browser.

Does the error checker work for GEDCOM files in languages other than English?

Yes. Date parsing supports the standard GEDCOM date grammar in all languages, names render correctly thanks to the encoding decoder, and place-name comparisons are case- and accent-insensitive.

What\'s the difference between an "error" and a "suggestion"?

An error is a logical contradiction — something that cannot be true (e.g. death before birth). A suggestion is an opportunity to improve the tree — a vague date, a missing source, or a likely census record you have not added. Errors appear in the Integrity report; suggestions appear in the Research and Vital Sharpener reports.

Can the error checker detect circular relationships (an ancestor who is also a descendant)?

Yes. Circular ancestry is treated as a high-severity structural error because it breaks ancestor traversal. The report shows the loop members and the family records involved.

Related tools

Ready to analyse your tree?

Drop your .ged file into GEDminer and get a full diagnostic in seconds. Your file never leaves your browser.

Upload GEDCOM file