The Lean Data System Lifecycle

Do we need more people … or a better workflow? This is a scenario that we all go through once in a while. You have the “official” development workflow in place that may consist of a mix of old school waterfall techniques, new agile techniques and different layers of quality control and approval. The development process often uses corporate tools, spreadsheets and analysts to make sense of the software to build. Behind this, to actually deliver software, you have the “real” workflow where efficient teams build a strong relationship with the data consumers.

Lean development embrace an open workflow based on communication and delivering continuous value to the business. As a bonus, this flow makes you build the documentation as a library of automated tests and models.


The data consumers add ideas (written as stories) to the backlog. Usually, those stories focus on the end deliverable for example: “Publish the internet sales to Tableau”. Rarely they will talk about data sources or about “how” to technically implement the story … and they should not. Many tools are now available to manage a backlog online and with remote teams: Trello, Pivotal Tracker, etc

The consumers prioritize a few stories as the most valuable and feasible. They are moved to the top of the backlog. The data engineering team will help with this prioritization, especially for the feasibility evaluation. It is important to note that prioritizing the whole backlog is a waste you can eliminate. We should prioritize just enough work to keep the development team going, could be to just pick the next 3 or 4 stories that we judge as having the most value and that engineering deem feasible. The rest of the stories will most likely change, evolve, get pushed down by other stories, so spending time sequencing the whole backlog brings very low value to the development process.

The data engineers pick the story at the top of the backlog. This should be the most important thing to do next.

An open discussion now happens about this specific story and three scenarios can emerge from this discussion:

  • We are not completely sure about the value or feasibility of the story -> we choose to build a prototype

  • We are sure that the story is valuable and feasible -> we develop it

  • We change our mind and think that the story is not as valuable as expected or it is not feasible at the current state of our business -> we return the story to the backlog

If we choose to do prototyping, we develop the prototype, the consumers review it and the story goes back to being discussed.

When the story is getting developed:

  • Consumers and Engineering work together to define the specs as automated tests. The tests are added to a library of existing tests that keeps growing over time. A test starts its life as a TDD (Development) test and when a feature is delivered the test automatically turns into a regression test. The tests are kept forever and must all be executed to assure we are not breaking any existing features. We develop an integration data model showing the relationships of our business data. We develop a presentation data model showing the final data presented to our consumers. We also develop a workflow model showing how our processes will run and the architecture diagram of the system.

  • Engineering develops the new features, writes code, until all the tests pass.

  • Engineering evaluates if the code should be improved and if it should, a refactoring of the code takes place, until all the tests pass.

  • When the code is good, engineering delivers the feature to the consumers. Good code is code that runs with minimal technical debt and makes all the tests pass. We are not after “perfect” unreadable code or overly clever code. We don’t want to create waste by over coding or coding for “just in case” scenarios.

  • The consumers review the feature:

    • If they approve the feature: engineering automates the feature and officially adds it to production

    • If they don’t approve the feature: consumers and engineering work together to update the specifications…and the development cycle restarts for this feature (code -> make the tests pass)

When you apply this workflow to your development process, you can support it with a tool like Trello. It will helps your team to focus on the most important things and it will also make your progress visible to the data consumers.


Written on October 17, 2017