The Lean Data System Lifecycle

Do we need more people … or a better workflow? This is a scenario that we all go through once in a while. You have the “official” development workflow in place that may consist of a mix of old school waterfall techniques, new agile techniques and different layers of quality control and approval. The development process often uses corporate tools, spreadsheets and analysts to make sense of the software to build. Behind this, to actually deliver software, you have the “real” workflow where efficient teams build a strong relationship with the data consumers.

Read More

Recomputable Data Systems (RCDS)

The evolution happening in distributed storage, distributed processing, and in-memory processing, opens the door to new ways of serving data for analytics. Instead of using complex incremental processes to serve data to your consumers, a Recomputable Data System (RCDS) re-computes your analytics datasets by reading ALL the raw data every time it runs. It is also capable of handling batch and real-time processing, presenting a current and consolidated view of your business whenever you need it.

Read More

Agile BI, a story from the dirt road

So much has been written about agility in software development that I was really wondering what I could add that brings a little value or clarity about the process and its application to Business Intelligence and Data Science. Here is a little story …

Read More

Dirty Data Modeling

Data Modelers are sometimes introvert people who like sifting through mountains of database schemas and documentation. Data modeling is to some extent an intellectual undertaking where you almost have to reach a level of connection to the domain you study that resembles a Zen master’s connection to the universe.

Read More

Is a good data architect expensive?

Sometimes, trying to save money on the salary of the people building your foundational data architecture can have repercussions that are a lot more costly than the money you “save” by going cheap.

Read More

How to fill the missing history?

In data warehousing, temporal data models and data flows have a real tendency to become complex very quickly. Adding to this, you may have to handle multiple disparate data sources that do not merge very well. You may want to load the same type of business events from multiple sources and run into missing attributes that creates blanks in your final serving tables.

Read More

The Lambda Architecture ... Speed and Agility

For experienced data architects processing data for data warehouses and business intelligence solutions, we have been used to think “incrementally”. We have been creating complex data models and incremental load processes that are effectively required to work around the limitations in storage and speed of our databases and ETL tools.

Read More

Why you need a PSA in your data architecture?

In my current data systems, I usually create a persistent staging area (PSA) that receives raw/untransformed data, keeps all of it (all history), in the format we receive it (zip, json, csv, etc), etc. That first layer is very important because anything else you generate from it implies human decisions that transform the data … and those decisions can (will) be wrong and will need to be changed. The PSA is the key to maintain an agile data workflow where prototyping and experiments are possible.

Read More

A short history of Data Modeling

I discovered computers in the 80’s when I was around 12 years old. It’s been an amazing ride through all the different evolutions of information computing from those early “DATA” lines hard-coded in Basic to today’s Big Data and NoSQL solutions often hosted in the Cloud.

Read More