Hiring additional data engineers is a problem, not a solution

Karina

I was speaking to a client in a fortune 50 firm the other day. He has a big budget, a large and talented data engineering team, and yet is unable to get basic questions answered on his data in a timely and efficient manner.

The engineering team has done a great job in setting up reports that were scoped out eight months ago. But that was eight months ago. In the ever-changing data landscape, and despite the plethora of talent at their disposal, if a business team asks questions that deviate ever so slightly from the norm (i.e. something not defined eight months ago), things go awry.

The example above fits the profile of virtually every mid to large sized organisation, including ones that have decent sized budgets. And if you’re a small business with limited funds, getting answers on your data is most definitely out of reach.

Adding more people creates complexity and inertia
While the tendency to throw in more data scientists and engineers at the problem may make sense if companies have the budget for it, that approach will potentially worsen the problem. Why? Because the more the engineers, the more layers of inefficiency between you and your data. Instead, a greater effort should be redirected toward empowering knowledge workers / data owners.

Data owners know what they want out of their data but don’t know how to go about it. On the other hand, data engineers know how, but don’t know why it is needed. This situation lends itself to a constant back and forth and a high level of inefficiency.

These inefficiencies limit the key decision makers’ ability to get answers to questions in a timely manner. This creates inertia, and eventually a cascading set of negative outcomes. Some of them include:

a guesswork based approach to decision-making
missed opportunities due to the inability to uncover new insights
staying behind the curve with competition
So why do these problems arise? Why is there a tendency to throw in more engineers to solve data problems? And how will a shift in mindset help alleviate these problems? I’d like to focus on the rest of the article on these questions.

The many roots of the problem
Usually, there isn’t a singular cause that creates barriers to data access. Let’s address some key ones:

High technical skills and fragmented technologies
The concept of insights by the click of a button is always easier said than done. Data needs to be retrieved, stored, consolidated, cleaned and analysed before any insights. Each stage requires a different skillset or technology.

IT as a gatekeeper and bottleneck
Whether it is legacy company policy, or lack of trust toward business departments having too much control on data, the IT department sometimes tends to limit, delay and block access to data. The process of data requests being stuck in IT land for weeks is very common, which often brings about unauthorised circumventions to get the job done, i.e Shadow IT.

Solution providers over-promise and under-deliver
It is a noisy world out there. And a lot of companies will promise to solve all your problems and end up falling woefully short. And in a lot of cases, solutions are sold to business analysts but the final product can only be handled by a technical person.

A multi faceted approach to removing the obstacles between the data and the decision-maker
Treat Data Capture and Data Reporting as separate projects
In most organisations, people tend to look at solving their data problems as a single initiative. Ideally, this should be broken into two parts: Data Capture and Data Reporting

I’m using Data Capture as a general term that involves identifying key attributes, creating processes, and architecting solutions for data collection. Depending on the complexity of the project, this is best served by software or database engineers.

Data Reporting on the other hand should be managed by the knowledge workers or data owners, i.e. people who understand the data really well.They are able to spot trends and anomalies quickly and have a deeper understanding of the data needs of the various stakeholders.

Address the technical problems with non-technical solutions
In the quest to achieving insights and keeping in mind that Data Reporting is best served by domain experts who are typically not technical, here are some common obstacles that need to be dealt with:

Access to large data
Consolidating disparate data
Cleaning up data
Automation and setting up data flows
While traditionally this was handled by IT, a whole suite of low-code or zero-code solutions exist that are designed to make people’s lives easier. Some of these are:

Mammoth Analytics (zero code)‍
Tableau Prep (zero/low code)‍
Alteryx (low code)‍
Stitch Data (low code)‍
Five Tran (low code)
Take a bite-sized, decentralised approach
A common mistake is to attempt to solve the problems in a monolithic, one-size-fits-all approach. This results in a delayed, costly and inefficient outcome. The best solution for eight months later is incredibly hard to predict. Instead, analysts should parse out the essentials and break things down into mini projects. Using a decentralised set of technologies and processes will help achieve a more optimised outcome.

Bottom Line
Organisations should constantly strive to remove the layers of friction between the knowledge worker/data owner and their data. If they are able to do so, the benefits are game changing. The following are some examples:

Domain experts gain the ability to make quicker decisions
Ability for the decision makers to uncover new insights and in turn discover new opportunities and stay ahead of competition
The engineering team is less inundated with data requests and is now able to focus on larger and more complex longer term projects
I hope this article provides a bit of guidance and clarity in this ever changing and noisy world of data. If you have thoughts on this post, I’d love to hear it. Feel free to leave comments below or reach out to me directly at gaurav@mammoth.io.

Mammoth Analytics achieves SOC 2, HIPAA, and GDPR certifications

Mammoth Analytics is pleased to announce the successful completion and independent audits relating to SOC 2 (Type 2), HIPAA, and GDPR certifications. Going beyond industry standards of compliance is a strong statement that at Mammoth, data security and privacy impact everything we do. The many months of rigorous testing and training have paid off.

Announcing our partnership with NielsenIQ

We’re really pleased to have joined the NielsenIQ Connect Partner Network, the largest open ecosystem of tech-driven solution providers for retailers and manufacturers in the fast-moving consumer goods (FMCG/CPG) industry. This new relationship will allow FMCG/CPG companies to harness the power of Mammoth to align disparate datasets to their NielsenIQ data.