Family Tree Data Quality: Score, Audit and Improve Your Genealogy Research

Most family trees grow organically over years of research. Facts get added from many sources - censuses, BMD records, hints, cousin trees, AI suggestions - and rarely get a systematic quality review. The result: a tree that *feels* impressive in scale but has dozens of unsourced parents, vague "abt 1840" dates, half-finished families and impossible date contradictions hiding in plain sight.

Data quality is the single biggest predictor of how useful a family tree is for further research, for cousin matching, and for handing on to the next generation. A small, well-sourced tree is far more valuable than a large, unsourced one. But until now there has been no quick, objective way to *score* your tree's quality and track whether your work is improving it.

GEDminer's data quality system gives every GEDCOM file a single weighted score out of 100 - Completeness (40%) + Sourcing (30%) + Consistency (30%) - plus an optional community percentile so you can see how your tree compares with others. Below is how the scoring works, what the audit covers, and how to use it as a permanent research feedback loop.

Upload GEDCOM file See all features 100% in-browser · Free · No signup

How GEDminer solves it

You don't know whether your tree is "good" or not.

A single weighted Tree Health Score gives you an objective number to track over time and improve.

Tree Health Score →

You want to compare against other genealogists' trees.

Optional community percentile (anonymised, opt-in) shows where your tree sits among all submitted trees.

Community Percentile →

You don't know which problems to fix first.

The audit ranks every issue by impact so you spend time on the high-value fixes first.

Vital Sharpener →

Unsourced facts you can't track down.

Sourcing audit lists every fact without a citation, ranked by importance.

Sourcing Audit →

Hidden contradictions across thousands of records.

The integrity scanner finds every logical inconsistency in seconds.

Error Detection →

Vague vital records like "Yorkshire" or "abt 1850".

Vital Sharpener ranks imprecise dates and places by impact so you fix the most important ones first.

Vital Sharpener →

How the Tree Health Score is calculated

The score is a weighted average of three sub-scores, each out of 100:

  • Completeness (40%) - what fraction of expected vital facts (birth date, birth place, death date, death place, parents, marriage) are present for individuals in the tree, weighted by genealogical importance.
  • Sourcing (30%) - *(Total Sources / Total Facts) × 100*, capped at 100. A fully-sourced tree scores the maximum here even with relatively few facts.
  • Consistency (30%) - fraction of facts that pass logical validation: dates in order, ages plausible, lifespans within human range, no impossible parent-child gaps.

The three sub-scores combine into a single number from 0 to 100. Most amateur trees score in the 35-55 range on first analysis. Well-maintained, properly-sourced trees can reach 75+.

What the audit covers

The full audit looks at:

  • Vital records - birth/death/marriage facts, completeness and precision
  • Sourcing - citations on facts, weighted by fact importance
  • Logical consistency - date order, ages, lifespans, parent-child gaps
  • Structural integrity - orphan records, broken family links, duplicate IDs
  • Place-name standardisation - consistent format, historically accurate jurisdictions
  • Living-person handling - privacy-aware exclusions for people likely still living

Every flagged issue links back to the affected individuals and shows the underlying GEDCOM record so you can fix it in your main software.

Using the score as a research feedback loop

The score is most useful as a *trend*, not a single number:

  1. Run the audit on your current GEDCOM and note the score.
  2. Pick one category to improve (sourcing is usually the highest-leverage).
  3. Spend a research session adding citations or sharpening dates.
  4. Re-export and re-run the audit. The score will move.

This turns abstract "improving your tree" into a measurable feedback loop. Many users find that just *seeing* the score motivates more rigorous research habits.

Privacy and the community percentile

The optional community percentile compares your tree against the anonymised distribution of all trees that have submitted scores. Submitting a score sends only the three sub-scores and a coarse record-count fingerprint of the tree - never the names, dates, or any GEDCOM content. You opt in by signing in, and you can opt out at any time.

Step-by-step guides

Frequently asked questions

How is the Tree Health Score calculated?

It is a weighted average of three sub-scores: Completeness (40%) - fraction of expected vital facts present; Sourcing (30%) - total sources divided by total facts; and Consistency (30%) - fraction of facts that pass logical validation. The combined score runs from 0 to 100.

What is a "good" Tree Health Score?

Most amateur trees score 35-55 on first analysis. Well-maintained trees with consistent sourcing typically score 65+. Very serious research trees with citations on every fact can reach 80+. The score is most useful as a trend over time, not as an absolute target.

How does the community percentile work?

When you opt in (by signing in and submitting), GEDminer sends the three sub-scores and a coarse record-count fingerprint of the tree to the server. The server returns the percentile of your score against the anonymised distribution of all submitted trees. No names, dates, or GEDCOM content are ever transmitted.

Will improving the score actually make my tree better?

Improving the underlying metrics - sourcing facts, sharpening vague dates, fixing logical inconsistencies - directly improves the research value of your tree. The score is a proxy for those metrics, so yes, optimising for the score does improve the tree in real terms.

Why is sourcing weighted so heavily?

Because an unsourced fact is essentially a guess until proved otherwise. Sourcing is what separates research from a pretty diagram, so it carries 30% of the total score even though it's structurally simpler than completeness.

Does the score account for living people?

Yes. Individuals who are likely still living (no death recorded and born within the last ~100 years) are excluded from the completeness calculation, since their facts genuinely cannot all be public. They are still counted for structural integrity.

How often should I re-run the audit?

A useful rhythm is after every significant research session, or whenever you complete a planned cleanup pass. Watching the trend line move in response to specific work is the most motivating part of the system.

Can I export the audit findings?

Yes. Errors, missing facts, sourcing gaps and duplicates can all be exported as CSV or XLSX so you can work through them offline or paste them into a research log.

Related tools

Ready to analyse your tree?

Drop your .ged file into GEDminer and get a full diagnostic in seconds. Your file never leaves your browser.

Upload GEDCOM file