Is your data integrity framework a popular spreadsheet?

0 3 4 minutes read

Is your data integrity framework a popular spreadsheet?

Nahla Davies examines what constitutes an appropriate framework for data integrity, and how inadequate frameworks harm data quality.

If you asked most companies if they have a data integrity framework, they would answer yes without hesitation. They’ll point you to a shared drive, maybe a Confluence page, maybe a colorful spreadsheet with tabs labeled ‘Authentication Rules’ and ‘Ownership Matrix’. It looks legit. It has a mark on it. Someone even added conditional formatting.

But here’s the thing: looking like a frame and actually working as one are two very different realities. In every industry, organizations confuse documentation and management, and the gap between the two is where data quality quietly declines. The problem is not that parties don’t care. It is because they have made sure that the spreadsheet is sufficient.

The spreadsheet trap is more common than you admit

There is a pattern that plays out in almost every mid-sized organization that has undergone some form of digital transformation in the past five years. Someone in data engineering or analytics gets the job of ‘building a data integrity framework’. They do their research, compile best practices, and create a document. Maybe it lives in Google Sheets, maybe a Notion database, maybe an actual PDF that was emailed almost once and forgotten. Either way, it checks the box. Leadership sees and feels validated.

The problem begins when that document must survive contact with reality. The data lines are changing. New sources are added. Team members rotate. And that spreadsheet? It does not update itself. It does not send alerts when the schema changes or when a sensitive field starts returning nulls at double the normal rate. It just sits there, frozen in time of creation, slowly becoming a historical artifact rather than a functional tool.

The worst thing is that people keep referring to it as if it is still hitting the test. Decisions are made based on validation rules that haven’t been updated in months. Ownership columns list people who have left the company. It’s like a 2019 map navigation organization and you wonder why you keep getting rid of it.

And it’s not a niche problem. Gartner’s 2023 survey found that poor data quality costs organizations an average of $12.9m per year. That number does not come from significant, headline-grabbing violations. It comes from the slow, invisible accumulation of bad records, missed puzzles, and unexamined assumptions that a static document cannot capture.

What a real frame it looks like

So what separates an effective data integrity framework from a well-formatted spreadsheet? It comes down to whether the thing is able to work without someone supervising it in person. The original framework is embedded in your infrastructure. It’s automatic, intuitive and responsive.

That means validation checks work as part of your data pipeline, not as a quarterly audit that someone remembers to do in the last week of the quarter. It means that the data is properly annotated and that there is monitoring to flag anomalies in real time, whether that’s a sudden increase in empty values or a mismatch between source and destination statistics. Tools like Maximum Expectation, Monte Carlo and dbt testing exist specifically to bring this kind of rigor to the workflow.

It also means that ownership is enforced by using tools, not just being written on a tab. If a data asset has an owner registered in a data catalog, and that catalog connects to your alert system, accountability becomes structural. It stops being something you have to chase people down on Slack.

There is a cultural component here, too. Organizations with mature data integrity practices treat data quality as a brand concern and are better prepared to establish proper AI governance. Product managers care about it. Analysts constantly flag problems instead of working around them. Developers write data tests the same way they write code tests. That kind of culture doesn’t come from a spreadsheet. It comes from leadership, making it clear that data integrity is a priority, not a side project that someone manages when things are slow.

Companies that get this right often share a few characteristics. They have invested in visibility across their data stack. They treat schema changes as events that need to be updated, not things that just happen silently. And they have moved past the idea that scripture alone equals rule.

Why is it more important now than it was five years ago

The factors affecting data integrity have changed dramatically. Five years ago, a bad record on the reporting dashboard was annoying but manageable. Today, that bad record could be making a machine learning model that makes automated decisions about credit, hiring or patient care. The blast radius for poor data quality is widened because the systems that consume that data are more autonomous and compliant.

Regulatory pressure is also increasing. Frameworks like the EU’s AI Act and evolving data privacy laws are putting more scrutiny on how organizations handle the data that powers their products. It is becoming increasingly difficult to dismiss data quality issues as ‘technical debt will come to it eventually’. Regulators want to see evidence of governance, and a spreadsheet with last year’s date won’t cut it.

There is also a competitive angle. Companies that can trust their data move faster. They make decisions with more confidence. They spend less time piecing together conflicting reports and more time working out details. Data integrity isn’t glamorous, but it’s one of those fundamental things that quietly determines whether an organization can implement its strategy or just talk about it.

Final thoughts

The unfortunate truth is that most data integrity frameworks are not designed to be frameworks at all. They were designed to satisfy a request, check a compliance box, or give someone something to present at a meeting.

And that’s as good as a start. Every mature system started somewhere. But if your ‘draft’ is still a spreadsheet that no one has touched in six months, it’s time to be honest about what you really have.

True integrity requires spontaneity, recognition and cultural buy-in. A spreadsheet is never a destination. Treat it like the rough draft you’ve always been, and start building something that can fit your data.

Written by Nahla Davies

Nahla Davies is a software developer and technology writer. Before devoting his career full time to technical writing, he was able to – among other interesting things – work as a lead programmer at Inc. 5,000 for the production company, whose clients include Samsung, Time Warner, Netflix and Sony.

Don’t miss out on the information you need to succeed. Sign up for Daily BriefSilicon Republic’s digest of must-know sci-tech news.