Jon Tavernier

Types of Data

How I think about types of data.

Batch Data

  1. Single Entities

  2. Sets of Data

Streaming Data

Streaming Data is a never ending flow of data. Some examples include Twitter tweets, hospital bed telemetry data, subway ticket scans, etc. I like to further categorize data seen as these:

  1. Application Logs

  2. Intra-application Messages

  3. Inter-application Messages

  4. Business Events

Application Logs

  • In this scenario, your application is writing logs so engineers can see what's going on and troubleshoot technical issues.

  • This data is meant for you and only you. It's not meant for business folks to use so avoid providing it for analytical purposes.

  • Ensure you log discrete data, avoid logging sensitive data, and use appropriate logging levels.

  • Educate others on how to find your application's logs and what IDs can tie logs across applications.

Intra-application Messages

  • In this scenario, your application has separate components that communicate with one another.

  • This data is meant for you and only you. It's not meant for business folks to use so avoid providing it for analytical purposes.

Inter-application Messages

  • In this scenario, your application submits data to another application you do not control.

  • This data is meant for the receiving system only. It's not meant for business folks to use so avoid providing it for analytical purposes.

Business Events

  • In this scenario, applications fire true fire-and-forget business events meant for others to consume, including subscribers responsible for populating the analytical data store.

  • Example events might include Order Placed, New Customer Registration, Promo Code Used, and so on. The data format is crafted specifically for others to consume.

  • Avoid making breaking changes to your events and if a breaking change is needed, version your event. Plus, give folks time to move to the newer version (i.e. publish both versions for some time).