Encrypt Online
Choose theme

CSV import traps: quotes, delimiters, embedded newlines, encoding, and BOMs

A practical CSV guide that explains why simple-looking files still fail once they move between spreadsheets, databases, APIs, and shell scripts.

Encrypt Online Editorial Team3 min readData Formats & Debugging
CSV import traps: quotes, delimiters, embedded newlines, encoding, and BOMs guide cover

Tip

Lint or format before comparing data, then check that cleanup did not change the fields, order, or values that matter.

CSV looks easy because the happy path is easy. Real CSV breaks when commas appear inside fields, line breaks live inside quoted text, spreadsheets emit a BOM, or one tool assumes semicolons while another assumes commas.

Once you accept that CSV is a format family with conventions rather than one trivial string shape, debugging gets much faster.

Summary

Definition: CSV is a delimited text format where quoting, delimiters, line endings, and encoding choices all affect how rows and fields are parsed.

Why it matters: Import failures are often caused by format expectations at the edges rather than by bad underlying data.

Pitfall: A file can look fine in a text editor and still parse differently across tools because of quoting rules, delimiter assumptions, or an invisible BOM.

The visible comma is not the whole story

Quoting determines whether a comma is a separator or just part of a field value. The same goes for line breaks inside quoted text. That means “it has one line per record” is not always true. A record can legally span multiple visual lines when a field contains an embedded newline.

This is where copy-paste debugging goes wrong. People inspect by sight and miss the parser rules.

Encoding and BOM issues are invisible until they are not

A UTF-8 BOM at the start of the file can turn the first header into a subtly wrong string for downstream tools. Delimiter expectations vary by locale and export source. Spreadsheet exports are especially good at surfacing these differences because they hide them behind a friendly UI.

A trustworthy CSV tool should show the parsed columns and the raw first bytes clearly enough that the invisible stops being invisible.

  • Quoted commas are field content, not separators.
  • Quoted newlines can keep one logical record across multiple physical lines.
  • A BOM can poison the first header name silently.
  • Delimiter assumptions vary by ecosystem.

Normalize once, then move on

The quickest workflow is usually to parse the file with a tool that shows you exactly how it was interpreted, normalize the delimiter and encoding if needed, and only then feed it into the target import path.

Quick example

Use this when a file that looks like two rows turns into three or four during import.

What to notice: That is one record with an embedded newline inside a quoted field, not two separate records.

CSV
name,notes
"alpha","line one
line two"

Practical check

  • Inspect quoting and delimiter assumptions before changing data values.
  • Check for a BOM when the first header behaves strangely.
  • Treat embedded newlines as a parser rule, not necessarily a corrupt file.

FAQ

Is CSV always comma-separated?

In practice, many ecosystems use semicolons or tab-like variants depending on locale and export settings.

Can a CSV field contain a newline?

Yes, if it is quoted according to the parser rules.

Developer workflow

Use this guide as a debugging pass before you paste structured data into an API, config file, or migration script.

  1. Keep one raw copy of the payload before any formatter touches it.
  2. Lint or format first, then compare important fields and ordering before converting.
  3. Save the final clean payload separately from notes, comments, and temporary examples.
Text
1. raw payload
2. lint/format without changing meaning
3. compare fields and ordering
4. convert only after validation passes

References