How to: Fix a corrupt CSV file

By the time you're actively working with data in CSV format, chances are you will run into issues when opening your file(s). The data may appear corrupt. This can especially be true if you received a file from another party and it's your first time opening it, or perhaps you downloaded an export from a database. Wherever it may have come from, the scenario looks the same: you open the file and it looks like a mess.

So what do we mean when we call the data corrupt?

To put it simply, it means that the data you are looking at is not in its true form. It's not clean. Something happened during the translation between when you received the file and when you opened it. Like if a mediocre app missed a few key words when it gave you the translation result and the result is pure gibberish...
The most common data corruption issues you'll face include:

  • incorrectly formatted numbers (perhaps account numbers magically transformed into dates...)
  • dropped leading zeroes
  • numbers appearing as scientific notation or other such similarly fun jargon
  • missing or crazy translation of special characters (cue Wingdings-esque madness)
When you see these types of results, you know something went awry. Of course your data did not start like that. It can be hard to narrow down the cause or know how to fix it, especially if you've never dealt with it before. And even more, it's just plain anxiety-inducing if you open your file and it looks so.dang.messy. Who wants to waste time troubleshooting that kind of nonsense?

Luckily there are mostly simple solutions to the common issues you'll encounter. We've also covered them in our article on CSV problem-solving, but we'll review them here. (Hint: most of the issues are a result of opening a CSV in Excel, without going through the import wizard!)
Troubleshoot your file by going through this checklist:

  1. Does your file have UTF-8 encoding? (Tip: here's how to do that)
  2. Does your data contain columns with numbers that exceed 15 digits? If so, use the import wizard to ensure Excel doesn't auto-format that as exponential numbers.
  3. Does your data contain columns with leading zeroes? If so, again, use import wizard to ensure those are kept intact.
  4. Do you see dates where there shouldn't be? Again, use import wizard to control the formatting.
  5. Does your file have delimiters within the text (such commas appearing in the text itself)? Make sure they are contained within quotes.
  6. Check your file for any extra spaces between delimiters, at the end of rows, etc.
  7. Check that there are no extra columns or rows that are empty
When you find yourself facing a massive amount of corrupt data - try this: open the CSV file in a Text Editor (you know, like simple Notepad) instead of Excel. Does your data still look corrupt? Find the issues there and correct them (if you find any). This can also help narrow down whether the real culprit is Excel.