This article was originally published in Forbes.
By Joe McCunney, CEO & President, Scalar Labs
When personal computers started making their way into homes in the late 1970s, the 5.25-inch floppy disk—the really big, bendy one—was the most common way to store data.
Today, a single digital photo taken with your mobile device is typically saved, by default, as a 1.5 to 5 MB file. So, theoretically, it would take at least five 360 KB 5.25-inch floppy disks to save your one photo.
The Evolution Of Data And What’s Ahead
From floppy disks to CD-ROMs and external drives to the cloud, our collective ability over the last few decades to generate and store data is mind-blowing. Windows 3.1, which used a modest 6.69 MB to install, pales in comparison to today's average household, which uses 641 gigabytes every month. That's over 1.7 million 5.25-inch disks!
Why is this happening, and why is data increasing? For one, it’s the natural evolution of data within the digital age combined with our insatiable curiosity. Take the healthcare industry and AI.
Nearly 30% of the world’s data is generated by the healthcare industry, and medical knowledge doubles every 73 days. The emergence of generative AI is also having a profound impact on data growth. According to TIRIAS Research, tokens—roughly analogous to words or symbols—will increase 100X to over 1 quadrillion annually by the end of 2028, thanks to generative AI. For digital images, the increase is said to be more than 400X to over 10 trillion.
This data deluge is a testament to our human need and appetite for information and our ability to innovate on that information. Researchers have said that mapping a human brain would take a zettabyte (ZB), if not more. One ZB is equivalent to 1 trillion gigabytes. Yes—1 trillion!
The Value Of Data And Challenges Of Consistency
Brain mapping aside, here’s the dilemma with data: How do we store and manage this global explosion of data in a way that is both meaningful and reliable? For example, if you have the token 19:50, what does it mean? It can mean 19:50 in military time, 19:50 as a game score or 19:50 as a ratio.
What gives it true meaning and value is the context behind the number. Like a novel, every piece of data has a story attached to it. However, unlike novels, depending on who’s looking at the data, it could have a completely different plot and ending. This is where data consistency and reliability become critical factors.
On the other hand (here’s a brain twister), just because a piece of data "looks" the same doesn’t make it the same. We know that 19.5 degrees Celsius is not equal to 19.5 degrees Fahrenheit. As a temperature measurement, 19.5 degrees Celsius equals 67.1 degrees Fahrenheit. We know this because our brains, our human computing systems, already have this context built into us from our education. Or maybe, like me, you Googled it. Data, in its pure base form, does not have any context and is just a meaningless token.
So far, we’ve discussed numeric data, and now that I have you at the edge of your seat, the real fun begins! Imagine yourself as a secret agent hunting down the international spy, "John Doe." How do you weed through all the John Does to discover the one you’re searching for? Is "John" the same person as "Johnny Doe" or "Jon Doe"? Maybe, maybe not. It all depends on the story of the data.
Like John Doe, many of us are living examples of unique inconsistencies. If your legal name is Madaline, you might use a shortened version of your name, like Maddie. To add to the inconsistencies, if you’re lucky enough to have Aussie friends, they might lovingly call you Mads or Madda. Whether Madaline, Maddie, Mads or Madda, you're still you. But in a data-scape that spans the globe, how do you identify which you is you?
One Inconsistency Can Cost Millions Across The Enterprise
A personal name change might be a mild inconvenience, but inconsistent data can cost you more than you think. Imagine operating a global network of factories ordering from multiple suppliers across many databases, with a linguistically diverse workforce inputting data in multiple languages. One minor inconsistency in the supplier's data set can have a reverberating effect, costing millions.
Data consistency goes beyond visually having the same data. To make sound business decisions, data needs to be in sync across diverse IT systems and databases, and maintaining such systems is quite the challenge.
Each enterprise company may utilize various databases across a network of hundreds of databases. Choosing or migrating to the most suitable database or cloud vendor becomes highly complex. The challenges extend beyond cost; they include security, compliance, performance, interoperability and technical gaps. Then there’s the ever-present concern of vendor lock-in, where you can't easily migrate out once you choose a database or cloud provider. This severely limits organizations' freedom to nimbly adapt to change.
The New Era Of Data And Cloud Migrations, De-Risked
What’s needed is the concept of "your data now, everywhere." Treat all underlying databases as a single entity, providing data consistency and reliability across all platforms, whether AI, apps or data lakes. The magic lies in leveraging new technologies to facilitate communication between databases—across disparate on-premise or cloud-based systems—providing a seamless environment for building applications.
Much like creating a dialog between databases, it is crucial to have internal conversations involving key business and technical stakeholders focused on data management strategies within your organization. Unless there are aligned objectives, it is likely that although new technologies may, in the short term, solve your problem, the underlying issues that led to it may rear their ugly heads again down the road.
Ultimately, like negotiators in international policies, our job as data negotiators is to ensure that the data we create and manage is consistent, accurate, reliable and secure. In pursuit of this, de-risking data and cloud migrations must be a top business priority as data takes over the world.
As the CEO and President of Scalar Labs, Joe McCunney leads software solutions that simplify complex data challenges across the enterprise.
Kommentare