control characters

September 27, 2015

A large part of my job is making sure data gets from point A to point B, specifically by way of writing SQL scripts that pull that data from my company’s tables (we provide cloud HRIS software) and format it in such a way that some other company’s servers can parse it correctly and load it into their tables.  That involves a significant amount of learning about and being familiar with a number of different file layouts, some standardized (like the ANSI 834 file, which is used to communicate benefits coverage information to insurance companies) and some proprietary.  One such file I was dealing with recently was an ACH file — these are standard payroll files that send direct deposit information from various institutions like mine to banks.  The problem I was having was that we were sending the file to a particular bank, and they were giving me a cryptic error, saying that every line of the file had 2 extra characters at the end: ^M.  This was the first ACH file I had done, and I knew that those characters weren’t anywhere in the file I was sending, so at first I was thoroughly confused.

After some research, I found the culprit by finding a number of blog posts mentioning that on UNIX systems in certain cases, the standard Windows line ending (LF/CR, line feed/carriage return) shows up as ‘^M’.  This is because the standard ‘\n’ command to create a new line is actually a sequence of two control characters (control characters are just ASCII characters that you don’t actually see in WYSIWYG editors; they do things other than signify letters and numbers that involves controlling the flow of the document).  The first is the LF or line feed character, and the second is the CR or carriage return character.  These are also known as CHAR(10) and CHAR(13) respectively since these are their ASCII values.  In almost all situations this is what you want to end a line in a file, but on ACH files (and certain other contexts), the carriage return character causes a problem.  I confirmed the issue by opening my file in hex format (I use Sublime Text 2, but you can open a file in hex format in any number of text editors), and saw that every line ended with the 2 characters 0a0d, or 0x0a 0x0d as hex characters are often written.

The next step was to actually formulate a resolution.  At work I use an internal development framework for generating text files that automatically creates a lot of the SQL I use so that I don’t have to type it over and over again, so I had never actually specified the line ending in a BCP statement (the bulk copy statement that takes a SQL result table and outputs it to a file, such as a CSV file that can be opened in Excel).  But, there’s a first time for everything.  I specified that the line endings should be just the line feed character by using ‘-r 0x0a’ in the BCP command.  This tells SQL to use just the line feed character as opposed to the Windows standard LF/CR.  And presto — perfect file that was accepted by the bank.