Process CSV Streams into Codable Objects

Doing a lot with CSV lately. I had started building a CSV parser in Swift several years ago when actively using an Objective C framework that I kept finding bugs in. That framework used pointers to extract fields, which really made things difficult to debug.

One objective is to allow incremental processing, as happens with web streams: in the past, I handled millions of records and it was obviously important to start showing users results as soon as a few records were decoded.

As a semi-retired dev (not my choice!), I decided to upgrade the Swift code and “do it right”. The project is table driven and uses a state machine. Unit tests probe many of the edge cases (e.g., empty file, just one column thus no delimiters, spaces before or after fields, etc).

It has taken a lot longer than I thought, and the unit tests took longer to write and debug than the original source! [Anyone telling you after weeks of development to write some unit tests in day is totally disconnected from reality—this happened to be recently too!)

What I’ve been doing is modifying all my open source that deals with streams of data, and make them comply to Apple’s Streams interface (InputStream and OutputStream). It turns out that with a Steam interface, its pretty easy to make them work with Combine, as Operators (combination of a Publisher and a Subscriber). The reason for being an Operator is that they accept a Publisher’s raw data stream (as a Subscriber) then output something (as a Publisher). In the case of CSV, the code takes a CSV line and decodes it into a struct or class that compiles with a simple protocol, CSVDecode. In practice, an app using it will direct a stream of data (file or network), and get a stream of objects in return.

In addition, the code is packaged as a Swift Package, recently supported in Xcode 11. Packages are great!

What really helped me was to create a diagram of how CSV works. There’s actually a RFC describing it, along with common extensions, all of which this code supports. A new program on the Mac App Store, Diagrams, was a huge help! It’s an easy to use and not expensive: it took me about 5 minutes to master it. For instance, this is what I created to let me code the comment line section:

Screen Shot 2020-03-13 at 5.35.46 PM

The overall state diagram took me a lot longer than I thought it would too! I kept finding edge cases.

Next, I’ll write a test app that lets people access a slew of CSV repositories on the web. If you want to see the source click this link SAX-CSV-Parser-Package.