I once was tasked with figuring out how to ‘democratize data’ for internal employees. No other instructions, solely a general pain point of ‘the data team is stuck doing ad-hoc tickets’ and ‘stakeholders want to get data on their own.’ After floundering for a while, I set out to figure out what data self-serve looked like at other companies. Seemed simple enough. But I quickly learned things aren’t that simple, and when are they for cross-functional data projects, anyway?
I want to share what I learned during an earnest effort to stand up data self-serve. I know others are struggling with this same project and its ambiguities and humongous scope. I spent time reading, thinking, attempting, failing, trying again, failing again, trying again and seeing success. Let’s dive in.
Executive Summary
Data rarely moves fast enough across companies to enable data-informed decisions. The data team is a bottleneck behind which many requests stack up in a queue. The data team drowns in questions and stakeholders become frustrated. Long wait times ensue, forcing the business one of three decisions:
(1) Wait to make a decision
(2) Make a decision without data
(3) Departments hire their own data workers.
The data velocity problem is not reasonably solved through sheer volume of hiring. Data workers are expensive and hard to find. Instead, data teams typically pivot to enabling the business via self-serve and data democratization. Ideally, this unlocks the data team to focus on strategic analyses and initiatives and the business is freed to find the data they need without submitting a ticket.
Effective data teams must pivot away from (or avoid entirely!) taking tickets and into partnership with the organization, focusing on building scalable data solutions from which others can self-serve.
Data Self-Serve Definition
Data self-serve is notoriously hard to define. Any definition is inevitably specific to a company and point in time. The definition below is therefore intentionally generic:
Ability for any employee to quickly find and leverage the data and insights they need for their role without funneling through the Data Team
Good Self-Service Always Looks Different
(and bad self-service always looks the same)
Data self-serve done well, by its very nature, looks very different from company to company. The tools, processes, and organization structure supporting self-serve requires tailoring to the organization, rather than following a blueprint.
In contrast, data self-serve nearly always looks the same when done poorly. The data team is overwhelmed with tickets, business users can’t find data and strategic analyses do not happen.
Put another way, bad data organizations tend to look the same, but working data organizations look very different from each other.
This puts data teams in a liberating but uncomfortable position. They’re able to freely define and create the self-service experience that their organization needs at this very moment, but at the same time there’s no blueprint for success. This requires an interactive approach to find the best solution for their company.
The focus must be on the stakeholders’ needs, not the data team’s needs. Data teams have a habit of making data-self serve in their image instead of thinking about who they’re serving. The focus must be and will be squarely on the data self-serve experience of coworkers.
Typical Hurdles to Self-Serve
It’s easy to conjure up a world where someone types or speaks a question into a machine and gets the data they need. This is the ultimate data self-serve utopia and one on full display in the excellent sci-fi show The Expanse. The main characters routinely verbally ask computers questions like “Pull up every ship within X distance which was made by Y company and left Z destination in the last week.”
There are many hurdles before that sort of world is possible. Let’s cover a few of them:
Data operated as a service
Many data teams operate with a “receive a ticket, answer a ticket” mindset. This limits the team’s output into reactive short-term and smaller-scale asks and puts them as a bottleneck between the business and data. This is not an uncommon problem within the data industry. Data service teams are typically overstretched and struggle to answer all the questions coming their way. The business inevitably assumes the data team isn’t a strategic partner as they don’t seem to operate like one. If you operate like a service desk, you’ll be treated like one.
This team structure rarely scales. I recall a discussion with a C-Suite member who criticized the data team as “Getting me answers so late that I’ve forgotten my question by the time they reply.” Ouch.
In contrast, effective BI teams operate with a product-like mindset that focuses on scale and solutions. They partner closely with stakeholders to solve problems and prioritize ruthlessly based on business impact.
Data foundations not yet ready for self-serve
There is an immense amount of work required to get data ready for self-serve. Just cleaning up a few raw data tables isn’t enough. Each part of the business (Sales, Marketing, Product, etc.) need different sets of data to answer their unique use cases.
Prepping data into the right shape requires close partnership and collaboration between the data team and its internal stakeholders. This requires steps like data ingestion and transformation, implementing tooling like Git and dbt and having a team that can support the data lifecycle of a company.
Lack of data literacy
Data literacy, much like self-serve, is a tough term to nail down. This deserves its own discussion entirely, but for now let’s go with a typically squishy definition along the lines of “How well stakeholders can interact with and understand data.”
Training for data literacy is immensely difficult. Even if you have the world’s best data models and data marts and the Modern Data Stack™️ stakeholders will struggle to find value if they can’t grok the internal business data model or fall into common data pitfalls. This hurdle must be overcome no matter how well you do everything else.
Lack of data tools that enable self-serve
Typically there are two primary ways that an individual can self-serve data: SQL on a database or look at a data-team-created dashboard. SQL is great for technical individuals but is not an option for the majority of employees. Dashboards are usually widely available but lack customization. Generally dashboards are a “you get what you get” type of experience, with little to no drill down capability and slow turnaround times from data teams for enhancement requests.
Data teams must provide other options for non-SQL savvy users to explore data in a more ad-hoc sense, leveraging curated/enriched tables created for their department. This can look as simple as providing access to enriched data in Excel or “Reverse ETL” where you send data back to source systems like Salesforce for direct consumption in those contexts.
Data privacy
This varies company to company, but data privacy comes into play depending on industry and company size. And whenever privacy is a factor, data access becomes more difficult. Typical lines in the sand are material non-public information (MNPI) when a company is publicly traded or personally identifiable information (PII) that only specific people should have access to.
Data self-serve almost always runs into data privacy concerns and the height of this hurdle (or wall…) will depend on the company.
But…What Does Good Self-Serve Look Like?!
Even though I described at length that good self-serve usually looks different, there are still some guiding principles to shoot for. These may not be universal and may change depending on your company’s data maturity, but they should be helpful.
(1) Focus initial efforts on specific departments/teams
One common bugaboo is an attempt to boil the ocean. The data team is already spread thin and pivoting everyone to self-serve for all departments at the same time will be too much. Instead, focus on a couple teams or departments with clear self-serve needs. Assign a specific analyst or two for the project who already understands a particular business domain and want to take on the challenge.
This both narrows the scope and increases the likelihood of close partnership with those teams. With partnership will come alignment on business value and understanding of pain points. Everyone wins.
(2) Create roadmap in partnership with stakeholders
Self-serve must necessarily look different from department to department. The needs for Finance are wholly different from Product, Engineering, Field, Legal or Marketing. This is why a focus on self-serve and a dedicated BI partner is so crucial. Requirement gathering and roadmap creation must be done in close collaboration between BI and each department. Examples of requirements to gather include:
Examples of requirements to gather:
* Use cases
* Defining personas (technical/non-technical/etc)
* Tools needed
* Datasets
* Training/Enablement
(3) Build source of truth data marts
A data mart is a set of tables designed for ease of use by a department for their self-serve needs. These tables are specifically curated by the data team to make data easy to consume and understand for a particular department.
Just providing individuals with access to the entire database is inevitably overwhelming. There could be hundreds of billions of data points across thousands of columns and hundreds to thousands of tables. Many analysts need a year to become comfortable with data at its most granular state. Expecting non-analysts to just hop in and find value isn’t reasonable.
To avoid this steep learning curve, a curated data mart enables self-service without overwhelming stakeholders. This curated data mart must be built in close collaboration between the data partner and their stakeholders to find the sweet spot of “plenty of data” and “not confusing”.
Example: The sales team needs a few good tables such as Account, Opportunity and Task from which they can build most any report they need.
(4) Create an adoption and discoverability program
Data discoverability is an enormous challenge that must be tackled on several angles. The existence of data marts alone is not enough to drive adoption if individuals do not know how they exist or do not know how to leverage them.
To drive adoption, efforts must include:
* Training / Onboarding sessions for all stakeholders
* Clear documentation for all data marts, tools available, key reports
* Weekly office hours
* Monthly & quarterly prioritization meetings
* Deprecation process to clean out old/unused data products
Parting Notes
There’s much more to write and I’ll follow up around defining internal stakeholder personas and choosing technologies that solve different aspects of data self-serve. For now, I hope the key message you took away from this is:
“I’m empowered to figure out how to best do self-serve at my organization.”
There’s an art to this task, and that’s why it’s so difficult to find anyone giving a blueprint. There really isn’t one. And you’ll never “arrive” at the conclusion of this project. You’ll just continually improve it, much like you do all your other data efforts. The fun is in the journey.