Genealogy Data Analysis: Statistics, Maps and Patterns from Your GEDCOM File
A GEDCOM file is a database of your family — but most genealogy software treats it as a list of names. The interesting questions ("when did my family arrive in Glasgow?
did the Smiths concentrate in one parish for two centuries?
how did average lifespan change between my 18th- and 20th-century ancestors?") need real data analysis, not a tree view.
GEDminer turns your .ged file into an analytics dashboard: birth-year distributions, surname frequency, occupation breakdowns, place concentration centroids, migration flows, lifespan trends, family-size patterns, and a weighted data quality score with a community percentile. Every chart is interactive, every table is exportable, and nothing leaves your browser.
Below is each analytical view, the question it answers, and the guide that explains how to interpret the output. The deep-dive sections cover what the underlying maths actually does, and the FAQ at the bottom answers the long-tail questions one-name studiers and serious researchers most often ask.
How GEDminer solves it
Which centuries does my tree cover well, and which are thin?
A birth-year histogram shows individuals per decade, instantly revealing the eras with the most coverage gaps.
Overview Dashboard →Which surnames dominate my tree, and how do they cluster?
Surname analysis ranks surnames by frequency and groups variants using Soundex + Levenshtein clustering.
Surname Distribution →How did my family move across the country and the world?
Migration Analysis plots first international moves, peak migration decades and destination concentrations on an interactive map.
Migration Analysis →What did my ancestors do for a living, and how did that change over time?
The Occupations tab categorises every recorded job into historical groups (Agriculture, Trades, Industrial, Professional, etc.) with a per-decade breakdown.
Occupations Analysis →Are the dates and sourcing in my tree actually any good?
A weighted Tree Health Score (Completeness 40%, Sourcing 30%, Consistency 30%) plus a community percentile shows where you stand.
Tree Health Score →Where did my ancestors live, geographically?
Top Birth Locations and per-country mini-maps show concentration centroids and let you drill into individual parishes.
Locations Dashboard →Turning a GEDCOM into a dataset
Genealogy software treats your file as a tree to be navigated. Data analysis treats it as a dataset to be queried. The distinction matters because the most interesting questions about a family are statistical — patterns over time, geographic concentrations, demographic trends — and a tree view can't answer them.
GEDminer parses your GEDCOM into structured tables in memory: one row per individual, one per family, one per fact, one per source, one per place. Every analytical view is then a pivot over those tables. Birth-year histograms count rows per decade. Surname distributions group rows by family name. Migration maps plot edges between birth and later residence locations. Occupation breakdowns categorise the OCCU tag values into historical groups.
Because the analytics are computed from the dataset, every view stays consistent: the count of individuals on the dashboard equals the row count in the People tab equals the total in the surname histogram. There is no manual aggregation to drift out of date.
Surname distribution and one-name studies
Anyone running a one-name study needs three things: an accurate frequency count for the surname (and its variants) across time, a geographic distribution, and a way to spot related branches that may not yet be merged.
GEDminer's surname tools provide all three:
- Frequency by decade shows how the surname's prevalence in your tree has changed over time.
- Variant clustering uses Soundex + Levenshtein with a 0.55 similarity threshold to group spellings (Smith / Smyth / Smithe) so the count reflects the true population, not the spelling fragmentation.
- Geographic concentration plots the surname onto a centroid map per decade, so you can watch a family migrate.
Combined with the Migration Analysis and the Hidden Cousin Connector, this is enough to drive most of the analytical work in a serious one-name study.
How the migration map is computed
Migration is computed at two levels: per-individual and aggregated.
Per individual, the analyzer compares an individual's birth, marriage, residence and death locations. Each pair of consecutive different locations becomes a migration "leg" with a year (or year range) attached. Legs that cross a national border become flagged as international moves.
Aggregated, all legs are bucketed by decade and by origin/destination country, producing flow lines on the map and the per-decade peak-migration histogram. Internal moves (within a country) are shown separately from international ones so you can answer questions at either level.
The map deliberately strips most labels for visual clarity; you can drill into any country or region to see the individual ancestors driving the flow.
Occupations: how categorisation works
Raw GEDCOM occupation tags are messy: "ag lab", "agricultural labourer", "Ag. Lab.", "farm worker" and "labourer (farm)" are all the same thing recorded five different ways.
GEDminer normalises occupation strings (case folding, abbreviation expansion, trailing-punctuation stripping) and assigns each one to a historical category: Agriculture, Trades & Crafts, Industrial, Professional & Clerical, Domestic Service, Military, Maritime, Religious, Mining & Extractive, Transport, Retail & Commerce, and Unclassified.
Categories are mapped consistently per era so you can compare an 18th-century "weaver" with a 19th-century "factory weaver" without losing the historical distinction. The breakdown is shown both as a flat distribution and as a stacked-bar over time.
Lifespan, family size and demographic patterns
Once you have birth and death dates for a critical mass of ancestors, demographic patterns fall out of the data. GEDminer surfaces several:
- Mean and median lifespan by birth decade — typically rising from the late 19th century onwards.
- Family size by marriage decade — typically falling after about 1870.
- Age at first marriage — historically lower than people often assume.
- Sibling spacing — useful for catching incomplete families.
These are computed from the underlying data tables, so they update automatically as you add records. They're as much a sanity check as a research tool — anomalies (lifespans suddenly halving in a decade, family sizes spiking unrealistically) usually point to an underlying data quality issue worth investigating.
Step-by-step guides
How to Analyse a GEDCOM File Online
A step-by-step walkthrough for uploading your GEDCOM file and getting instant insights into your family tree - demographics, errors, migration patterns, and research priorities.
Understanding Your Family Tree Dashboard
Master the Overview dashboard to quickly understand your tree\'s scope, completeness, and key patterns with demographic charts and statistics.
Understanding Your Tree Health and Data Quality Score
Learn how GEDminer evaluates your tree\'s data quality, what the health score means, and practical steps to improve your tree\'s completeness and accuracy.
Analysing Family Migration Patterns
Discover where your ancestors came from and where they went. Visualise migration flows, identify patterns, and understand the historical context of family movements.
Analysing Ancestor Occupations and Work History
Explore the working lives of your ancestors - from farmers to factory workers, professionals to tradespeople - and understand the social history of your family.
Surname Distribution Analysis and One-Name Studies
A surname is rarely just one name — spellings drift, branches scatter, and patterns emerge only when you map the whole picture. Here is how to use surname distribution analysis to drive deeper research.
Exploring Ancestor Locations on an Interactive Map
Discover where your ancestors lived, spot geographic patterns, and explore place hierarchies with the interactive Location Explorer map.
Discovering Family Events with On This Day
See which family events - births, marriages, and deaths - happened on today\'s date in history, connecting you to your ancestors in a personal way.
Understanding Family Tree Connectivity and Structure
Learn what connected components are, why your tree might have isolated individuals, and how to strengthen the structural integrity of your GEDCOM file.
Frequently asked questions
What kinds of charts and analyses does GEDminer produce?
Birth-year histograms, surname frequency and clustering, occupation category breakdowns, location concentration maps, migration flows, lifespan trends, family size distributions, and a weighted data quality score.
Can I export the analysis results?
Yes — every table can be exported as CSV or XLSX, and most charts have an underlying data view you can copy. Useful for research logs, blog posts, or one-name studies.
Does it support one-name studies?
Yes. The surname distribution view, location centroids, surname-variant clustering and migration map together cover the core analytical needs of a one-name study, and surname variants are clustered automatically.
How are migration patterns calculated?
For each individual, GEDminer compares birth, marriage, residence and death locations to detect movement events, then aggregates them by decade and country to produce flow lines and peak-migration charts.
Are my ancestors\' details visible to anyone else?
No. Analysis runs locally in your browser. If you save a tree to your optional account, the compressed parsed data is stored privately under row-level security and is only accessible to you.
How accurate are the surname-variant clusters?
The clustering uses combined Soundex + Levenshtein at a 0.55 threshold, which catches most genuine variants while keeping false positives low. The threshold and the manual exclusion list are both adjustable in settings.
Can I see how my tree compares to others statistically?
Yes. The Tree Health Score includes an optional community percentile so you can see how your tree\'s completeness, sourcing and consistency compare to other anonymised trees of similar size.
How does the analysis handle living people for privacy?
Living individuals (no death recorded and born within the last ~100 years) are handled carefully: they\'re excluded from the data quality completeness calculation and partially obscured in Presenter Mode so you can share screens without exposing personal details.
Does the analysis cover non-Western naming and date conventions?
Names are handled as opaque strings, so any naming convention works. Dates use the standard GEDCOM date grammar, which supports Julian, Hebrew, French Republican and other historic calendar systems alongside Gregorian.
Can I use the analysis output to write a family history book?
Yes — the per-ancestor profiles, migration flows, occupation breakdowns and location concentrations export cleanly into research notes, and the timeline view is well suited to per-individual chapters in a written history.
How often should I re-run the analysis?
A useful rhythm is to re-run after every significant research session — even a small batch of new records can change a centroid or a peak-migration decade. The Tree Health Score is the quickest way to check whether the latest work moved the needle.
Related tools
Free GEDCOM Analyzer: Inspect, Validate and Visualise Your Family Tree Online
Upload a .ged file and get instant analysis: errors, duplicates, missing dates, migration maps, census gaps and a data quality score. 100% in-browser, no signup required.
Find Missing Ancestors in Your Family Tree
Uncover missing ancestors, hidden cousins, and unrecorded generations in your family tree. Free analyzer that pinpoints exactly where your research has gaps.
Clean Up Your Family Tree: Remove Duplicates, Fix Dates and Standardise Places
Tidy up an inherited or messy family tree: merge duplicates, standardise place names, fix encoding issues and surface missing facts. Free, browser-based.
Ready to analyse your tree?
Drop your .ged file into GEDminer and get a full diagnostic in seconds. Your file never leaves your browser.
Upload GEDCOM file