Compare CSV files and ignore spaces, accents and casing.
Find rows that are truly missing vs rows that only differ because of formatting. Enable cleaning rules to normalize before comparing — then see real differences only.
Why formatting differences hide real mismatches
Two rows that look identical to a human may not match in a plain diff because one has a trailing space, one uses an accented character, or casing is different between the systems that exported each file. A naive byte-level diff will tell you everything is different and bury the actual changes under hundreds of false positives. The interesting differences — a missing order, a changed price, a removed contact — disappear into the noise.
The cleaning rules in MessyMatch normalise values before comparing. You choose which normalisations to apply: trim whitespace, collapse spaces, strip accents, ignore casing, normalise emails, normalise phone numbers, strip leading zeros and more. The originals are preserved in the result — the normalised form is used only for matching.
How the cleaning rules pipeline works
Every cell in both files passes through the cleaning rules you enabled, in a fixed order. The output is a normalised string used as the comparison key. Two rows are considered a match when their normalised keys are equal — even if the raw values differ in spacing, casing or accents. The original cell values are still what you see in the export.
Rules are composable. You can stack 'trim whitespace + collapse spaces + ignore casing + ignore accents' for free-text fields and 'normalise emails' on the email column. The cleaning rules panel keeps the toggles obvious so you can dial in the right level of strictness for your data.
Available normalisations
- Trim trailing and leading spaces
- Collapse multiple spaces into one
- Remove invisible characters (zero-width spaces, non-breaking spaces)
- Ignore upper/lowercase — treat JOSE and jose as the same
- Ignore accents — treat Jose and José as the same
- Ignore punctuation — useful for codes and names
- Ignore separators (dashes, slashes) — AB-123 = AB123
- Normalize emails (lowercase, strip brackets)
- Normalize phone numbers (digits and leading + only)
- Strip leading zeros — 00123 = 123
The 'almost match' bucket: where messy formatting goes
Cleaning rules cover most of the noise but not all of it. 'Johnatan' vs 'Jonathan', 'Acme Corp' vs 'Acme Corporation' — those are not exact matches even after stripping spaces and ignoring casing. For those, the fuzzy pass scores pairs with Jaro-Winkler similarity and flags anything above the threshold as an almost-match, complete with the score and the reason. You can read more about the fuzzy mode on the fuzzy match CSV page.
Diagnostic mode: see what changed and why
When a row is flagged as an almost-match, the result panel shows which cleaning rules made the values converge and which difference remained. That diagnostic — accent stripped, case folded, leading zero removed — is what tells you whether to accept the match or treat it as a real difference. It is the opposite of opaque: every result is auditable.
Originals are never modified
The cleaning rules are applied in memory during the comparison. The strings in the parsed dataset are never mutated, and the exports always carry the original values. If you want a cleaned version of the data, that is a different operation — this tool is about comparing reliably, not about rewriting your file.
Browser-first by design
The CSV contents are processed inside your browser via a Web Worker and are not transmitted to our servers. Our servers do not have an endpoint that ingests file contents — the Web Worker reads the file from disk, runs the comparison locally and hands the result back to the browser. We only record metadata about the operation (row count, file size, format, elapsed time) for abuse limits. See the privacy policy for the full list.
Related tools
Frequently asked questions
What does 'ignore formatting' mean exactly?+
Cleaning rules normalize each value before comparing: trim spaces, collapse internal whitespace, lower-case, strip accents, drop punctuation and remove invisible characters. The original values are kept and shown in the results.
Will it modify my files?+
No. Normalization is applied only in memory during comparison. Originals are never altered.
Does it detect typos?+
Yes, approximately. Fuzzy matching (Jaro-Winkler with blocking) finds values that look close to each other even when they are not byte-identical after cleaning.
Is my data uploaded?+
No. Files are parsed in your browser by a Web Worker and never sent to our servers.
Can I ignore leading zeros in IDs?+
Yes. Turn on 'strip leading zeros' so '000123' and '123' are treated as the same value.
Can I compare phone numbers across different formats?+
Yes. The 'normalize phones' rule strips spaces, dashes and country prefix variations before comparison.