|Life after the digital Big Bang (July 2008)|
Some vital statistics: The digital universe is the total number of “bits” (1s or 0s) created, captured, and replicated throughout the world by all the servers, computer hard drives, data storage units, e-mails, and yes, radio frequency (RFID) tags—plus those greatest of all digital gluttons: images from digital cameras, cell phones, surveillance cameras, peer-to-peer video sharing, DVDs, medical imaging, and on and on. To all those bits, add what IDC calls digital “shadows,” including web search histories, financial transaction journals, and mailing lists. Then add in back-office digitizing.
IDC computed that in 2007 the global digital universe was composed of some 281 billion billion bytes (each with eight bits) of digital information. That’s more than all the stars in the known universe. It’s 10% more than IDC calculated for 2006. In 2011 the digital universe will be ten times the size it was in 2006.
At this point, an already-overworked IT manager might rise and object that she’s not responsible for managing all the bits stored in PC hard drives and digital cameras. True. Each enterprise is only responsible for managing its own digital environment. However, IDC found that, while individuals create some 70% of the digital universe, enterprises are responsible for the security, privacy, reliability, and compliance of 85%.
What to make of these dazzling stats? Are they another “inconvenient truth”? A precious, limited resource like oil? Might the fallout from the digital big bang choke the internet just as Web 2.0 is moving everything onto it? The IDC report cites Web 2.0 as part of the solution, not the problem. So, could the big bang be a good thing, considering the alternative? Consider: until recently we’ve been living in an information society whose transactions were recorded in ink on paper and shipped hither and yon by trucks, trains, and airplanes.
In the digital society, those transactions are created and managed as variations in the structures of atoms and sent around the world instantly at about the cost of a local phone call. The new way is surely more efficient. IDC doesn’t discuss the possibility of clogging the internet, but it accepts the benign view that more digitizing is a good thing. The report concludes that each enterprise will have to manage and govern the explosion of digital information in its own rapidly changing environment.
On the question of social efficiency, IDC calculates the amount that various industries spend on digital information and compares those numbers to each industry’s contribution to global economic output.
In these terms, the financial services industry looks pretty good. It handles secure, sensitive transactions involving trillions of dollars a day—equal to the world’s annual gross economic output. Financial services use 6% of the digital universe to produce 6% of the global economic product. The industry’s share of the digital universe will fall to 3% in 2011, IDC predicts. The reason? Not much digital imaging going on (i.e. video imaging), IDC says.
At the other end of the economic-productivity spectrum, broadcast, media and entertainment industries generate only 4% of the world’s output but generate 50% of the digital universe. IDC predicts that those percentages will be even more lopsided in the next ten years, when most countries will be broadcasting digital TV and most movies will be digital.
Storage, storage, storage
The IDC report cites elements in an enterprise’s digital environment for which executives can be held—sometimes legally—responsible: information security, privacy protection, copyright protection, screening for obscenity, detecting fraud, and reporting on, archiving, searching, retrieving, and disposing of content. All of it involves storage.
Wells Fargo—following the lead of Microsoft, Google, Yahoo and others—recently announced vSafe, a cyberspace storage facility. The service will upload and give customers online access to such data as tax returns, marriage and death certificates, passports, wills, and digital pictures. The cost: $4.95 a month for a gigabyte of storage and access. That’s surely a nice way to clean out an attic and obviously could sometimes be more than just a convenience. But it also means the 90% or more of stored data that will never be needed will forever be part of the digital universe.
The IDC study found that in 2007 the amount of stored information was about equal to available storage. By 2011, almost half of the digital universe will not have a permanent home. To cope with the exploding mass of information in every part of an enterprise, IDC says that management must “Transform their existing relationships with the business units. These are the groups that will classify information, set retention policies, deal with customers whose data the company holds, and face the public if data is lost, breached, compromised, or simply handled badly.” Then, based on continuous sensing of the changing information environment, an enterprise can devise flexible policies that will be managed by IT and applied to every part of the organization, its customers, and its suppliers.
The biggest challenge may be to “rush new tools and standards into the organization: storage optimization, unstructured data search, database analytics, and resource pooling (virtualization).”
Needed: new warehouse architecture
Obviously, every enterprise will need to spectacularly expand its data warehouse capacity, either in-house or outsourced. That’s already happening—fast—according to Richard Winter, president and founder of WinterCorp, consultants in large-scale data management. Every few years the firm identifies the world’s ten largest and most heavily used databases. From 1998 through 2005, year of the latest survey, the size of the world’s largest data warehouse tripled approximately every two years, yielding a compound annual growth rate of 173%. In 2005 the largest warehouse contained 100 terabytes (trillions of bytes) actually used. If that rate continued, the number should now be in the range of 900 (see chart). Winter points out that his firm measures data actually used—which he says is typically about one-fifth of the needed capacity. Thus, 100 terabytes actually used implies a 500 terabyte capacity.
Although no banks have yet made it into the world’s top ten databases in the WinterCorp series, Winter says that banks are now planning data warehouses in the top-ten range. “...The next five to ten years will bring dramatically new and different architectures and products for the management of data. With every dimension of database scale racing upward at exponential rates, I believe we will need products that are substantially simpler to use and administer. . . in a more fully automatic manner.”
Winter noted with approval the May announcement by Aster Data Systems of an innovative solution that transforms off-the-shelf, commodity hardware into a self-managing analytic database. The new solution is being used by MySpace, the popular social network, to analyze customer preferences using a cluster of 100 server nodes, an aggregate volume that comes to more than one terabyte of new data every day. Aster CEO and cofounder Mayank Bawa says he wants to open a market for the kind of clustered database successfully used by Google and Yahoo.
The electronic version of this article available at: http://lb.ec2.nxtbook.com/nxtbooks/sb/ababj0708/index.php?startid=44
| TechTopics Plus