Date: February 15, 2012
Sponsor: Mass Technology Leadership Council
Moderator: Mike Stonebraker, Professor, MIT
Speakers:
- Christopher Ahlberg, CEO & Co-Founder, Recorded Future
- Fritz Knabe, Distinguished Engineer, Netezza
- Mark Watkins, GM of Entertainment Content, Telenav
- Deepak Advani, Vice President, Business Analytics Products and Solutions, IBM
- Andy Palmer, Startup Specialist
- Puneet Batra, Chief Data Scientist, Kyruus
- Alan Hoffman, Founder & President, Cloudant
- George Radford, Field CTO, EMC Greenplum
- Bill Simmons, CTO, DataXu
The age of Big Data has just begun. One speaker at this all-morning conference, held at Kendall Square in Cambridge, likened the current situation to where we were 100 years ago as electricity was emerging. People were just beginning at that time to see the possibilities.
I recall my own mother, who was born in 1905, describing her great amazement over hearing a radio broadcast of music for the very first time. Before that she had to go to a live concert to hear music. My guess is the remote dairy farm that she lived on in South Dakota now plays music in the barn while the cows are being milked.
The Big Data conference focused on the challenges of making Big Data useful along with its possibilities. The challenges appear to be huge, but there was an emphasis on the opportunities that come with any challenge.
The challenges fall into three categories:
- Storing and handling the ocean of data for quick retrieval
- Extracting a cup, quart, or gallon of useful data, as need be
- Presenting the data quickly in a meaningful context
There is always a bottleneck in computer performance. The old storage bottleneck has gone away with the low price of flash memory. The new bottleneck is the number of CPUs (computer processing units) it takes to handle the data.
The electricity costs for the CPUs required to crunch Big Data are prohibitive.
One of the challenges of Big Data storage management is its location. Space provided in the cloud by companies such as Amazon and Google consists of black boxes that some engineers find puzzling. They would prefer to manage the space themselves so they can make it work right for their situation.
Finding the data that will be useful can be complicated. And when marketers, in particular, get a whiff of what’s possible, their demands are likely to increase exponentially. An example was given about predictive tracking of stocks. Early requests were for an instant report on stock prices immediately after the market opens each day. Then requests came for hourly reports during the day, followed by reports every minute and visual charts of trends and correlations. Oh, my!
I heard a lot of talk about arrays being thought of as a solution to the problem of scalability. In my own 25 years of experience, having developed around 50 relational databases, I have found the need to use an array only once or twice. Arrays tend to get complicated rapidly. A background as a math major will be helpful for those seeking a career in Big Data. In-depth knowledge of physics was also mentioned as desirable.
It certainly appears that the Big Data effort is going to have to go far beyond the world of structured data, which relies mainly on spreadsheet-like tables with one-to-many relationships. For example: one client (parent table) with many (child table) invoices, or one (parent table) invoice with many (child table) items linked together by unique identifiers.)
But that is structured data, set up to be structured. Arrays are another way of handling child data—but it remains to be seen just how they will be used with text data, which is unstructured.
No one knows what the job description will be for this new area. It sounds to me as if the Big Data discipline is at about the same place that computer science was back in the 1960s, when COBOL was invented. One company announced it was hiring, and looking for people with strong math, physics, and database knowledge.
Good solutions have yet to be developed. One speaker described today’s choices for a solution as being between “bad and bad.”
The Web was discussed as a good place for gathering data about trends and outbreaks. The challenge is in analyzing unstructured data. Much of it is text based. It needs to be looked at in its context. An example given was the use of the word “pretty.” Pretty good is very different from pretty bad.
The need for translation from and into Arabic, Chinese, and other non–Roman alphabet languages complicates text analysis. Knowing the cultural context is another challenge. Again, I say—oh, my!
In my early days as a database developer I set up quite a few businesses with databases tracking name, address, and phone numbers. Now out-of-the box applications like Outlook are easy to come by. Web-based databases like Salesforce.com, designed to meet the needs of salespeople, have also evolved.
It is expected that focused applications like Outlook and Salesforce will emerge in the Big Data area. Some possibilities that were mentioned include developing business outcomes for tracking outbreaks of disease, identifying attempts at money laundering, and identifying students at risk of dropping out of school.
It was also mentioned that identification of problems will be the easier part. Figuring out how to solve them is much harder. Having spent 15 years as a junior/senior high school teacher, I would agree that figuring out which students are dropout risks should be easy. Figuring out how to keep them in school is a much harder problem.
Collection of device data on such items as lights turning on and off is now easy to do. People are now able to turn their home heat up or down remotely from their cellphones. The interconnection of devices has great possibilities.
A lot is already being done with GPS tracking. Big Data on traffic density is starting to be used to help people plan their route from home to work or work to their next appointment.
Because of different needs and uses of Big Data, there is not going to be one single approach. The problems yet to be solved are how long it takes to process the data, what data should be extracted, and how it should be presented. Right now no one knows how to make all this happen. Oh, my.
{ 0 comments }
