Sunday, November 01, 2009

Reflections on IOD 2009 and the Babylonian Sheep

IBM's conference Information on Demand 2009 (IOD 2009) took the IOD story a step further with considerable attention paid to predictive analytics. No surprise, as IBM just completed the acquisition of SPSS, a Chicago-based predictive analytics company.
Indeed, the IOD sequence has gone from gathering and grooming data (a single, hopefully truthful view of the company/customer base), making it all accessible (de-siloing), finding hidden gems (data mining and analysis) to using the data to look into future trends and developments (predictive analytics or wild guessing with lots of numbers/??/).
Common to all of this is that we have a large global base of electronic data that various smart spiders can zip through and give us the answers we need. But that sort of presumes that this vast electronic data base, growing by exobytes every year, will still be fully searchable and subject to analysis in 50 or 100 years.
Which brings up the Babylonian Sheep Question that I put to a few IBM folks, including Ambuj Goyal, but got no really satisfactory answer. Which is fine, because there probably is none. It is not so much a question as a kind of meta-framework setter.
So what is the Babylonian Sheep Question?
Simply, that if I want to get a rough idea of what the market was like for sheep in Babylon 4000 years ago, much of the data warehouse (clay tablets) are still there. One can drill down to the single transaction record level -- Uruk sold 20 sheep to Gilgamesh for 50 bushels of wheat, or whatever. Since the data warehouse (scattered among various museums) has survived 4000 or more years, I can safely assume it will be around in 200 or 300 years, never mind 50 years down the road (assuming no nuclear holocausts or the like, although that would only bake the clay more...).
Now, as for all those exobytes -- I really don't know what will be accessible in 2030. I certainly can't access some of the floppy disks I still have with stuff I wrote in the 1980s. With the global datasphere growing at petabytes per week (month?) it is clear that we will have to have new data compression and storage technologies. Otherwise we will be replacing the world's forests with forests of blade servers. I can imagine that 10 years from now, most data will be stored holographically in the cloud or in some other extreme volume storage technology. That is a long way from the floppy disk.
Ambuy Goyal, by the way, talked about data decommissioning, which addressed the issue of keeping relevant data around and instantly accessible, while putting historically interesting, but operationally irrevelant data stored elsewhere. Fair enough. But no answer to the Babylonian Sheep question. Because irrelevant as Uruk's sale of 20 sheep is 4000 years later, it is still there and available in printed translation from the cuneiform inscriptions. What will happen to my first American Express purchase records (from 1977 on some mainframe), which may be of some relevance when I am a 80 year old geezer some 20 years from now is another question. It may be solved, probably will be solved. But I am not all that confident that our entire electronic record of the world and daily life will be as durable as those fragmentary records of the Babylonian sheep market that we can still examine and analyze today.

