How-To · 7 min read · Updated 2026-04-24

Fix Garbled Names in GEDCOM Files: Character Encoding Guide

If accented characters, apostrophes, or non-Latin scripts look broken in your family tree, the problem is almost always character encoding. Here is how to diagnose and fix it.

Why names look broken in GEDCOM files

If you've ever opened a GEDCOM and seen names like Müller instead of Müller, OâBrien instead of O'Brien, or boxes and question marks where letters should be, you're looking at a character encoding mismatch.

A GEDCOM file is plain text, but "plain text" can be stored in many different encodings. The two pieces of software that wrote and read the file disagreed on which encoding to use, so accented and special characters were misinterpreted.

The data isn't lost — it's just being decoded wrongly. Once you identify the original encoding, the names snap back to normal.

The four encodings you will encounter

1. UTF-8 (modern standard) Supports every character in every language. Used by GEDCOM 7.0, modern web tools, and most current desktop software. The safest choice.

2. ANSEL (legacy genealogy standard) A specialist encoding used by older versions of PAF and some legacy programs. Handles European accents but not Asian or Cyrillic scripts. Recognisable by its declaration line: \1 CHAR ANSEL\

3. Windows-1252 / ANSI (Windows legacy) Windows' default encoding for Western European text. Common in older exports from Windows-based programs. Looks like UTF-8 for plain English but breaks on accents.

4. UTF-16 / Unicode (rare) Used by some Microsoft tools. Less common in genealogy.

GEDminer attempts UTF-8 first, then falls back through Windows-1252, Mac Roman, and Latin-1 automatically — so most files load correctly without any action from you.

How to identify the encoding of your GEDCOM

Open the .ged file in a text editor (Notepad, TextEdit, VS Code) and look at the header. The first 20 lines usually contain a line like:

\\\ 1 CHAR UTF-8 \\\ or

\\\ 1 CHAR ANSEL \\\ or

\\\ 1 CHAR ANSI \\\

This is the encoding the file *says* it uses. Sometimes the declaration is wrong — a file may claim ANSI but actually contain UTF-8 — which is the root of most garbling.

If the declaration is missing or incorrect, the only fix is to re-export from your source software with explicit UTF-8 selected.

Fix 1: Re-export with UTF-8

The cleanest fix is always to re-export from the original software with UTF-8 explicitly selected. In most desktop genealogy programs the option appears in the GEDCOM export dialog as "Character set" or "Encoding" — choose UTF-8 (sometimes labelled "Unicode").

Major online tree platforms typically export in UTF-8 by default with no choice required.

Upload the new export to GEDminer and verify accented names render correctly.

Fix 2: Convert encoding manually

If you can't re-export (perhaps the original software is no longer available), convert the file manually:

On macOS / Linux using \iconv\:

\\\bash iconv -f WINDOWS-1252 -t UTF-8 input.ged > output.ged \\\

On Windows using PowerShell:

\\\powershell Get-Content input.ged -Encoding Default | Set-Content output.ged -Encoding UTF8 \\\

After conversion, open the new file in a text editor and update the header line to \1 CHAR UTF-8\ so downstream tools know what to expect.

For ANSEL → UTF-8 conversion specifically, dedicated open-source GEDCOM tools that can import ANSEL and re-export as UTF-8 are the most reliable option.

Fix 3: Patch individual records

If only a handful of records are broken, it's often faster to fix them by hand in your genealogy software:

  1. Search for the garbled string (e.g. "é") in your program
  2. Replace with the intended character ("é")
  3. Re-export and re-check

This is only practical for a small number of records — for tree-wide problems use Fix 1 or 2.

Preventing future encoding issues

To avoid encoding problems going forward:

  • Always export as UTF-8 when your software gives you the choice
  • Use GEDCOM 7.0 if your software supports it — UTF-8 is mandatory in v7
  • Keep one master tree in modern software that handles UTF-8 natively
  • Spot-check exports by uploading to GEDminer and searching for an ancestor with an accented name
  • Document non-Latin scripts with both transliterated and original-script versions where possible

Getting encoding right once means it stays right forever.

Tags: GEDCOM character encoding, GEDCOM UTF-8, ANSEL GEDCOM, garbled names genealogy, fix GEDCOM accents, GEDCOM special characters, Windows-1252 GEDCOM