← All posts

Why Excel breaks at 1 million rows (and what to do about it)

6 min read

Excel has a hard cap of 1,048,576 rows per worksheet — 220, which is what fits in a single 20-bit row index. If you open a CSV with more rows than that, Excel silently truncates the file and shows you a warning that most people miss. If the file has fewer rows but more than a few hundred thousand, Excel will open it — and then make you wait every time you touch a cell.

The cap is the famous part. The slowdown is the part that actually wrecks the workflow. Let us walk through both.

The hard cap: 1,048,576 rows

Every modern Excel file format (XLSX, XLSM, XLSB) inherits the same grid limit: 1,048,576 rows × 16,384 columns per sheet. That number is 220, chosen because Excel internally indexes rows with a 20-bit integer. The limit applies per sheet, not per workbook — so you can technically have many million-row sheets in one file, but no single sheet can grow past the cap.

What happens when you open a CSV with more rows than that? Excel shows a dialog ('File not loaded completely') and imports the first 1,048,576 rows, dropping the rest. The missing rows are gone from the spreadsheet but still on disk in the source CSV — which is the point at which most people save the truncated workbook over the original and lose data.

The soft cap: it hurts way before 1M

Long before the row limit, Excel becomes painful. The calculation engine recomputes the dependency graph on every edit; volatile functions (TODAY, NOW, OFFSET, INDIRECT, RAND) trigger a full recalc; conditional formatting evaluates cell-by-cell; and the file has to be re-written to disk on every save. At 200k rows with a few formula columns, every click stalls for a second. At 500k rows it stalls for several seconds. At 900k rows you can clearly hear the fan.

XLSX is also a zipped XML format. Opening the file means unzipping the entire archive into memory, parsing the XML into a grid and indexing every cell. A 200 MB XLSX can easily use 1.5–2 GB of RAM after parsing. On a laptop with 8 GB this is the moment the OS starts swapping and Excel feels frozen even when it is still doing work.

Why CSV is faster for large data

A CSV is plain text. You can read it line by line without holding the whole file in memory, which is exactly what every ETL tool, database loader and split utility does. The same file opened in Excel has to be fully parsed into a cell grid before you can do anything; opened as CSV by a streaming tool it can be processed in constant memory.

Practically, that means: if your data is past 100k rows, do not work with it as XLSX. Export to CSV and use tools built to stream — a database load, a Python script with csv.reader, a command-line splitter, or a browser-based tool that uses a Web Worker. The whole point is to avoid loading the file into a grid you do not need.

The practical workflow when your file is too big

When a CSV or XLSX is past what Excel can comfortably handle, the right move is almost always to split it into smaller import-ready partials. Each partial is small enough to open in Excel if you need to inspect it, small enough to import into whatever downstream system you are feeding, and the total data is preserved exactly. You do not lose anything by splitting — you just stop fighting the row cap.

That is the workflow MessyMatch is built for. The CSV splitter streams the file line by line and writes partials of the row count or byte size you specify. The Excel splitter does the same for XLSX with the caveat that XLSX parsing is heavier — for very large workbooks, exporting to CSV first and splitting that is faster and uses less RAM.

A short checklist

  • Past 100k rows in XLSX: export to CSV before doing anything heavy.
  • Past 500k rows in CSV: do not try to open in Excel at all. Use a CSV-aware tool.
  • Need to import into a system with a row cap (HubSpot, Salesforce, Mailchimp): split into partials of the cap size, keep headers in every partial.
  • Need to send the file: most providers cap attachments at 25 MB. Split by byte size, not row count, for that case.
  • Need to compare two big files: do not VLOOKUP. Diff them with a tool that streams. See the next post on VLOOKUP vs XLOOKUP vs diffing.

Related reading