The Monster That Ate IT
By John Parkinson
Sometime this year, the world's aggregate digital storage will pass a Zettabyte.
That, for those of you reaching for Google to check, is 1024 Exabytes (passed that one a while back) or over 220 Petabytes. In round numbers it's one sextillion bytes. And I remember when a Terabyte was a lot of data...
Sometimes I feel like I have to find a place to put it all.
Ever since we started storing digital data on magnetic media, the rate of improvement in storage density (think of it as the number of atoms needed to store one bit) has improved better than logarithmically. Although the rate of progress in storage technology is showing some signs of slowing, we are on a pace to store one bit per atom (assuming we figure out how) by around 2020.
That's the good news. That's actually better even than the processor technologies in terms of improving capacity and performance.
The bad news is that we are accumulating things to store so fast that by 2020, we won't have enough atoms available--even if we deploy global de-duplication to keep the number of unique items to an absolute minimum. Remember that if you want to be sure that you can always recover what you stored, you have to have some level of redundancy--and if you want to find anything reasonably fast, you have to devote some space to metadata or indexes. And then there are log files, audit trails, search contexts...not to mention latency factors that tend to increase the number of copies we need.
Almost as long as we have been storing data, we have been struggling to manage it. Almost all data has a set of "lifecycles" that influence where it should "reside" in terms of performance, cost, accessibility and so on. These lifecycles are not just determined by business workflows--the obvious lifecycle parameters--but might also have to take into account what the data is used for (reference versus transaction processing) how long it must be retained, what it links to, etc. Many aspects of context become important.
Worst of all are sets of data that are needed only occasionally, but needed very fast when the occasion arrives. Do we keep them close by, filling up high performance storage or further away, risking latency? These are tough design choices, with factors involved that can change dynamically, and the tools we have today really aren't as helpful as they should be.
I have been modeling these trends for some time now in the very specific business context for which I manage strategy and the future looks grim.
Some time in the next decade, storage will eat all of the capital investment capacity we are likely to have available. All businesses are going to have to face tough choices about how much they store and how long they can keep it.
We are all going to have to learn how to share more data in common--we won't be able to afford the multiple copies we keep today--both inside the business and between businesses. The sheer cost of storage and scarcity of capacity is going to create a new shared storage economy, unlike anything we have ever seen and way beyond the concepts of cloud storage that we are beginning to see emerge. That's going to get really interesting.
And I'll still probably be looking for somewhere convenient to put things.
John Parkinson is CIO of TransUnion. To read his columns for CIO Insight, click here.