Question 1

What is a fuzzy match in CSV comparison?

Accepted Answer

A fuzzy match. Also called an approximate match. Is two rows that should refer to the same record but are not byte-identical even after normalising spaces, casing and accents. For example, 'Acme Corp' vs 'Acme Corporation', or 'Johnatan' vs 'Jonathan'. MessyMatch flags these in the Almost matches tab with the similarity score and the reason.

Question 2

What algorithm does the fuzzy matcher use?

Accepted Answer

Jaro-Winkler similarity with a configurable threshold (default 0.92). To stay fast on large files the engine also uses blocking. Only candidates that share a prefix or fall inside a length band are scored. So it does not compare every row in A against every row in B.

Question 3

Can I tune how strict the fuzzy match is?

Accepted Answer

Yes. Lower the similarity threshold to catch looser matches (more results, more false positives). Raise it to only see very close matches. The threshold lives in the compare settings panel next to the cleaning rules.

Question 4

Will fuzzy matching find typos?

Accepted Answer

Yes, that is the primary use case. Single-letter typos, transposed characters, common misspellings of names and companies all surface as almost-matches. Each result shows both original values so you can decide whether to merge.

Question 5

Is fuzzy matching slower than exact compare?

Accepted Answer

Yes. It scores candidate pairs instead of hashing into a map. Blocking + length pre-filter keep it tractable up to about 50,000 rows per side on a normal laptop. Past that, narrow the candidate set with a key-column compare first and apply fuzzy only on the remaining mismatches.

Question 6

Are my files uploaded for fuzzy matching?

Accepted Answer

No. The Jaro-Winkler scoring runs in your browser via a Web Worker. The file contents are processed inside your browser via a Web Worker and are not transmitted to our servers.

Question 7

How is this different from VLOOKUP with TRUE?

Accepted Answer

VLOOKUP with TRUE assumes the lookup column is sorted ascending and returns the closest lower value, which is wrong for almost-equal text matching. MessyMatch uses string similarity scoring designed for messy human data. Names, companies, addresses. Not for sorted numeric ranges.

Fuzzy match CSV files online.

When exact matching is not enough

How the fuzzy matcher works

Always do the cheap normalisation first

Common use cases for fuzzy CSV matching

Honest limits of fuzzy matching

Browser-first by design

Related tools

Frequently asked questions