Documented Data File

Because reading data should be the easy part...

Documented Data File

Independent Project

You’d think that once you’ve collected and analyzed data, saving and re-reading it would be the easy part. For many types of data such as images, documents, or executables, it is the easy part. So why is it so complicated for general data, such as a list of numbers?

You’re probably thinking this is a solved problem. “There are plenty of formats! CSV, NPY, PKL, HD5, need I go on?”. These formats all have their merits. Some are very effient at serializing the data, importing data to and from a specific programming language, or being universally readable. However, when a project or data changes hands, be it to a collaborator or a new generation of researchers, having the data be self evident is critical. Although there are ways to work around this with established formats, such as adding headers in CSVs, this is not standardized and required a customized import system for each format, which is cumbersome and time consuming. This is where DDF comes in.

Overview of Format

The purpose of DDF file format is to provide a way to save data that is:

  1. Human Readable: Saves data in plain text, so you can read or create this file type with just a text editor, should the need arise. This also makes creating functions for reading and writing DDF files for your programming language of choice much easier than if it were a binary standard. It uses a syntax that is designed to prioritize human-readability over file size.
  2. Self-Evident: The data saved in DDF are not just associated with a variable name, but comments can be provided for each variable so that the data can be understood on their own. The format also encourages users to provide a description for the whole file through the use of a header paragraph.

A syntax example is shown below. In this brief example, the syntax for adding a header and variables is shown. DDF supports three types of variables, doubles, booleans, and strings, in addition to 1D and 2D matrices of all these types. To declare a variable, the type is indicated by the first token d, b or s for double, boolean and string, respectively. To declare a matrix, place the matrix type in angle brackets, preceeded by an m. For example, a matrix of doubles would be declared with m<d>.

Additionally, you can declare variables using a vertical syntax. This allows lists to be easily added to without needing to read in the file and rewrite it for each addition. The target application of this is long-term data logging using computationally-limited embedded systems such as a microcontroller. An example is shown below.

I’ve written input/output classes for DDF in three popular languages: C++, Python, and MATLAB. You can get started on the project’s GitHub repo.