Find duplicates in a CSV file online.

Drop your CSV to spot repeated rows by key column — emails, IDs, order numbers, SKUs. Ignore casing, spaces and accents so messy formatting does not hide real duplicates. Runs in your browser.

File A
Drop file A here
CSV, TSV, TXT or XLSX
File B
Drop file B here
CSV, TSV, TXT or XLSX

When do you need to find duplicates in a CSV?

Duplicates creep into CSV files in dozens of small ways. An export that ran twice, a form that did not validate the email field, a join that fanned out, a manual paste from two sources. By the time the file reaches your CRM, ERP or email tool, the same customer appears three times under slightly different spellings and the import either fails loudly or — worse — succeeds and triples your contact count.

The fastest way to catch these is to scan the file for repeated values in the column that is supposed to be unique per row. That column is almost always obvious in retrospect: email for a subscriber list, customer_id for a CRM export, order_number for a sales report, SKU for a product feed.

  • Subscriber lists — same email entered through web + form + import
  • CRM exports — duplicate contacts created by sync rules running twice
  • Sales reports — the same order showing up in two regions
  • Product catalogs — the same SKU listed by two suppliers
  • Survey responses — the same respondent submitting more than once
  • Migration files — old and new system both flushed into one CSV

How the duplicate detection works

MessyMatch parses your file in the browser, normalises each value with the cleaning rules you enabled, then groups rows by the key column you picked. Any key that maps to more than one row is a duplicate group — the result panel shows every row in each group side by side so you can pick the canonical one.

The compare engine surfaces duplicates as a dedicated tab in the result panel: Duplicates inside A and Duplicates inside B, separate from the cross-file diff. That separation matters: it lets you fix the duplicates inside one file before you reconcile against another system. Drop the same CSV in both slots if you want a pure intra-file dedupe and do not care about a cross-file comparison.

Ignore the formatting that fakes duplicates

Most CSV duplicates do not look like duplicates at first because one row has a trailing space, another has the name in upper case, a third uses an accented character. The cleaning rules in MessyMatch normalise values before grouping: trim whitespace, collapse internal spaces, ignore casing, strip accents, normalise emails and phone numbers, remove invisible characters. The original values are kept untouched in the result so you still see exactly what was in your file.

Approximate duplicates: when keys do not match exactly

Sometimes the duplicate is not in the key column but in a name or company column that is typed slightly differently each time. For those, switch on fuzzy matching. The engine uses Jaro-Winkler similarity with blocking so 'Acme Corp', 'Acme Corporation' and 'ACME corp.' are grouped as near-duplicates in the Almost matches tab with the similarity score and reason.

Export the deduped CSV

Each result section has its own export. Download the duplicates as CSV to feed back into your source system (most CRMs and ERPs accept a delete-by-ID file), or grab the unique rows as your new deduped dataset. Both exports preserve the original cell values — the cleaning rules are only used to detect the duplicates, never to rewrite your data.

Browser-first by design

The CSV contents are processed inside your browser via a Web Worker and are not transmitted to our servers. Our servers do not have an endpoint that ingests file contents — the Web Worker reads the file from disk, runs the dedupe locally and hands the result back to the browser. We only record metadata about the operation (row count, file size, format, elapsed time) for abuse limits. See the privacy policy for the full list.

Related tools

Frequently asked questions

How do I find duplicates inside a single CSV file?+

Drop your file into slot A. To focus only on intra-file duplicates, drop the same file into slot B and run the comparison — the Duplicates tab in the results lists every key that appears more than once inside file A and inside file B separately.

What counts as a duplicate row?+

By default, two rows are duplicates if every cell matches after the cleaning rules you enabled. In key-column mode, rows are duplicates when their key column repeats — even if other columns differ.

Can I ignore casing or trailing spaces when finding duplicates?+

Yes. Turn on trim whitespace, ignore casing, ignore accents, normalize emails or normalize phone numbers in the cleaning rules. 'José García', 'jose garcia' and 'JOSE GARCIA ' will be grouped as one record.

Can I export the deduped list?+

Yes. Each result section — including duplicates and the canonical 'in both' rows — has a CSV and XLSX export button. Download the deduped set or just the duplicates, depending on what you need.

Are my CSV files uploaded to find the duplicates?+

No. Parsing and duplicate detection happen in your browser via a Web Worker. The file contents are processed inside your browser via a Web Worker and are not transmitted to our servers.

How large a CSV can I dedupe?+

Anonymous users can run the tool on files up to about 2,000 rows for free. Larger files use the pay-as-you-go tiers (from $3). See the pricing page for the full caps.

What about near-duplicates — values that look the same but are not byte-identical?+

Use the fuzzy matching mode. It finds approximate duplicates ('Acme Corp' vs 'Acme Corporation' vs 'ACME corp') via Jaro-Winkler similarity. They are listed in the Almost matches tab with the reason.