Family Tree Data Quality: Score, Audit and Improve Your Genealogy Research
Most family trees grow organically over years of research. Facts get added from many sources - censuses, BMD records, hints, cousin trees, AI suggestions - and rarely get a systematic quality review. The result: a tree that *feels* impressive in scale but has dozens of unsourced parents, vague "abt 1840" dates, half-finished families and impossible date contradictions hiding in plain sight.
Data quality is the single biggest predictor of how useful a family tree is for further research, for cousin matching, and for handing on to the next generation. A small, well-sourced tree is far more valuable than a large, unsourced one. But until now there has been no quick, objective way to *score* your tree's quality and track whether your work is improving it.
GEDminer's data quality system gives every GEDCOM file a single weighted score out of 100 - Completeness (40%) + Sourcing (30%) + Consistency (30%) - plus an optional community percentile so you can see how your tree compares with others. Below is how the scoring works, what the audit covers, and how to use it as a permanent research feedback loop.
How GEDminer solves it
You don't know whether your tree is "good" or not.
A single weighted Tree Health Score gives you an objective number to track over time and improve.
Tree Health Score →You want to compare against other genealogists' trees.
Optional community percentile (anonymised, opt-in) shows where your tree sits among all submitted trees.
Community Percentile →You don't know which problems to fix first.
The audit ranks every issue by impact so you spend time on the high-value fixes first.
Vital Sharpener →Unsourced facts you can't track down.
Sourcing audit lists every fact without a citation, ranked by importance.
Sourcing Audit →Hidden contradictions across thousands of records.
The integrity scanner finds every logical inconsistency in seconds.
Error Detection →Vague vital records like "Yorkshire" or "abt 1850".
Vital Sharpener ranks imprecise dates and places by impact so you fix the most important ones first.
Vital Sharpener →How the Tree Health Score is calculated
The score is a weighted average of three sub-scores, each out of 100:
- Completeness (40%) - what fraction of expected vital facts (birth date, birth place, death date, death place, parents, marriage) are present for individuals in the tree, weighted by genealogical importance.
- Sourcing (30%) - *(Total Sources / Total Facts) × 100*, capped at 100. A fully-sourced tree scores the maximum here even with relatively few facts.
- Consistency (30%) - fraction of facts that pass logical validation: dates in order, ages plausible, lifespans within human range, no impossible parent-child gaps.
The three sub-scores combine into a single number from 0 to 100. Most amateur trees score in the 35-55 range on first analysis. Well-maintained, properly-sourced trees can reach 75+.
What the audit covers
The full audit looks at:
- Vital records - birth/death/marriage facts, completeness and precision
- Sourcing - citations on facts, weighted by fact importance
- Logical consistency - date order, ages, lifespans, parent-child gaps
- Structural integrity - orphan records, broken family links, duplicate IDs
- Place-name standardisation - consistent format, historically accurate jurisdictions
- Living-person handling - privacy-aware exclusions for people likely still living
Every flagged issue links back to the affected individuals and shows the underlying GEDCOM record so you can fix it in your main software.
Using the score as a research feedback loop
The score is most useful as a *trend*, not a single number:
- Run the audit on your current GEDCOM and note the score.
- Pick one category to improve (sourcing is usually the highest-leverage).
- Spend a research session adding citations or sharpening dates.
- Re-export and re-run the audit. The score will move.
This turns abstract "improving your tree" into a measurable feedback loop. Many users find that just *seeing* the score motivates more rigorous research habits.
Privacy and the community percentile
The optional community percentile compares your tree against the anonymised distribution of all trees that have submitted scores. Submitting a score sends only the three sub-scores and a coarse record-count fingerprint of the tree - never the names, dates, or any GEDCOM content. You opt in by signing in, and you can opt out at any time.
Step-by-step guides
Understanding Your Tree Health and Data Quality Score
Learn how GEDminer evaluates your tree's data quality, what the health score means, and practical steps to improve your tree's completeness and accuracy.
How to Find and Fix GEDCOM Errors
A practical guide to detecting impossible dates, missing records, duplicate entries, and other data quality issues in your GEDCOM family tree file.
GEDCOM Error Propagation: How Bad Family Tree Data Spreads
One wrong date, duplicate person, or bad merge can spread through a GEDCOM file and corrupt hundreds of downstream facts. Learn how genealogy errors propagate - and how to audit your tree before importing or sharing.
10 Common Genealogy Mistakes and How to Avoid Them
Even experienced researchers make these mistakes. Learn the 10 most common genealogy errors - from unsourced facts to name assumptions - and how GEDminer helps you catch them.
How to Cite Genealogy Sources Properly
Source citations turn a family tree into evidence. Learn the standard format, how to record citations efficiently, and how GEDminer measures your sourcing.
Finding and Merging Duplicate Individuals
Find potential duplicate individuals in your tree using smart matching, compare their records side-by-side, and learn best practices for merging them.
Using the Vital Sharpener to Improve Date Precision
The Vital Sharpener helps you identify estimated, incomplete, or missing vital records and prioritise which ones to research first for maximum impact.
Frequently asked questions
How is the Tree Health Score calculated?
It is a weighted average of three sub-scores: Completeness (40%) - fraction of expected vital facts present; Sourcing (30%) - total sources divided by total facts; and Consistency (30%) - fraction of facts that pass logical validation. The combined score runs from 0 to 100.
What is a "good" Tree Health Score?
Most amateur trees score 35-55 on first analysis. Well-maintained trees with consistent sourcing typically score 65+. Very serious research trees with citations on every fact can reach 80+. The score is most useful as a trend over time, not as an absolute target.
How does the community percentile work?
When you opt in (by signing in and submitting), GEDminer sends the three sub-scores and a coarse record-count fingerprint of the tree to the server. The server returns the percentile of your score against the anonymised distribution of all submitted trees. No names, dates, or GEDCOM content are ever transmitted.
Will improving the score actually make my tree better?
Improving the underlying metrics - sourcing facts, sharpening vague dates, fixing logical inconsistencies - directly improves the research value of your tree. The score is a proxy for those metrics, so yes, optimising for the score does improve the tree in real terms.
Why is sourcing weighted so heavily?
Because an unsourced fact is essentially a guess until proved otherwise. Sourcing is what separates research from a pretty diagram, so it carries 30% of the total score even though it's structurally simpler than completeness.
Does the score account for living people?
Yes. Individuals who are likely still living (no death recorded and born within the last ~100 years) are excluded from the completeness calculation, since their facts genuinely cannot all be public. They are still counted for structural integrity.
How often should I re-run the audit?
A useful rhythm is after every significant research session, or whenever you complete a planned cleanup pass. Watching the trend line move in response to specific work is the most motivating part of the system.
Can I export the audit findings?
Yes. Errors, missing facts, sourcing gaps and duplicates can all be exported as CSV or XLSX so you can work through them offline or paste them into a research log.
Related tools
Find and Fix Family Tree Errors Automatically
Detect impossible dates, duplicate ancestors, missing parents and broken relationships in your family tree in seconds. Free, browser-based GEDCOM error checker.
Free GEDCOM Validator: Check Your Family Tree File for Errors Online
Validate any GEDCOM file (.ged, .gedcom) for impossible dates, broken family links, duplicates, encoding bugs and structural issues.
Clean Up Your Family Tree: Remove Duplicates, Fix Dates and Standardise Places
Tidy up an inherited or messy family tree: merge duplicates, standardise place names, fix encoding issues and surface missing facts. Free, browser-based.
Free GEDCOM Analyzer: Inspect, Validate and Visualise Your Family Tree Online
Upload a .ged file and get instant analysis: errors, duplicates, missing dates, migration maps, census gaps and a data quality score.
Ready to analyse your tree?
Drop your .ged file into GEDminer and get a full diagnostic in seconds. Your file never leaves your browser.
Upload GEDCOM file