How to deal with special characters in CSV

So, you feel confident you know how to work with CSV and even how to navigate through some of the most common pitfalls. Then, something as simple as a punctuation mark can throw the whole thing into disarray again. Why?

Special characters.

What are they?

Special characters may sound like something you probably won't ever encounter or have to deal with. You think, that is just for characters in foreign languages or mathematical symbols that I'll never use. Think again.
A special character (remember, this is in computer-language) is any character that is not alphabetic or numeric. That means the group "special characters" encompasses a HUGE spectrum of both everyday symbols and the more unique, specialized symbols. Special characters include (but are not limited to!):

  • Punctuation marks, such as question marks and exclamation points
  • Math symbols, such as the "=" symbol, brackets, +/- and so on
  • Emojis
  • Trademark, copyright and other legal symbols
  • Foreign language characters and logograms, such as Chinese symbols and tilde, etc.
Chances are, you will run into special characters within your data. So if you're working with CSV, what special characters are actually supported?

Technically, all of them!

There are no specified limits of what characters can be used in a CSV file. What limits you is how your file is encoded.

If you recall from our article on choosing the right encoding, UTF-8 is the recommended file encoding, due to its vast support of just about any character possible. If you don't opt for UTF-8, chances are you may run into the dreaded sprinkling of "??????" throughout your file. That's the computer's way of telling you it has no idea what the character you typed is supposed to be!
The trouble with special characters and encoding usually pops up when opening a CSV in another program (i.e. not your generic text editor). If you open a CSV in Excel, for example, you are bound to see that "?" rear its ugly head. As you've seen repeated in many of our articles by now, it's imperative to import (not open!) CSV files into Excel in order for the data to be displayed accurately. In fact, the very first step in the wizard provides you with a dropdown to select the file "origin". This is where you can specify UTF-8!
The same goes for saving data from Excel to CSV format. First and foremost, ensure you are saving as ".CSV (comma delimited)". There are other variations that may throw you off - CSV (Mac) or CSV (MS DOS) and others - unless you know for a fact you need to use that specific type of CSV, ignore them!

Secondly, ensure that you are selecting UTF-8 encoding, not just relying on whatever Excel defaults to. Do this when you are saving/naming your CSV file by navigating to "Tools" and then "Web options". Choose the encoding tab and then from the dropdown, select UTF-8.
If you take care to encode your file correctly, you shouldn't run into any bumps in the road when dealing with special characters within your data.