Big Data Meeting Summary

by Jackie on February 29, 2012

By Jackie Grubb, Plum Suite Solutions

Date: February 15, 2012
Sponsor: Mass Technology Leadership Council
Moderator: Mike Stonebraker, Professor, MIT

Speakers:

  • Christopher Ahlberg, CEO & Co-Founder, Recorded Future
  • Fritz Knabe, Distinguished Engineer, Netezza
  •  Mark Watkins, GM of Entertainment Content, Telenav
  • Deepak Advani, Vice President, Business Analytics Products and Solutions, IBM
  • Andy Palmer, Startup Specialist
  • Puneet Batra, Chief Data Scientist, Kyruus
  • Alan Hoffman, Founder & President, Cloudant
  • George Radford, Field CTO, EMC Greenplum
  • Bill Simmons, CTO, DataXu

The age of Big Data has just begun. One speaker at this all-morning conference, held at Kendall Square in Cambridge, likened the current situation to where we were 100 years ago as electricity was emerging. People were just beginning at that time to see the possibilities.

I recall my own mother, who was born in 1905, describing her great amazement over hearing a radio broadcast of music for the very first time. Before that she had to go to a live concert to hear music. My guess is the remote dairy farm that she lived on in South Dakota now plays music in the barn while the cows are being milked.

The Big Data conference focused on the challenges of making Big Data useful along with its possibilities. The challenges appear to be huge, but there was an emphasis on the opportunities that come with any challenge.

The challenges fall into three categories:

  • Storing and handling the ocean of data for quick retrieval
  • Extracting a cup, quart, or gallon of useful data, as need be
  • Presenting the data quickly in a meaningful context

There is always a bottleneck in computer performance. The old storage bottleneck has gone away with the low price of flash memory. The new bottleneck is the number of CPUs (computer processing units) it takes to handle the data.

The electricity costs for the CPUs required to crunch Big Data are prohibitive.

One of the challenges of Big Data storage management is its location. Space provided in the cloud by companies such as Amazon and Google consists of black boxes that some engineers find puzzling. They would prefer to manage the space themselves so they can make it work right for their situation.

Finding the data that will be useful can be complicated. And when marketers, in particular, get a whiff of what’s possible, their demands are likely to increase exponentially. An example was given about predictive tracking of stocks. Early requests were for an instant report on stock prices immediately after the market opens each day. Then requests came for hourly reports during the day, followed by reports every minute and visual charts of trends and correlations. Oh, my!

I heard a lot of talk about arrays being thought of as a solution to the problem of scalability. In my own 25 years of experience, having developed around 50 relational databases, I have found the need to use an array only once or twice. Arrays tend to get complicated rapidly. A background as a math major will be helpful for those seeking a career in Big Data. In-depth knowledge of physics was also mentioned as desirable.

It certainly appears that the Big Data effort is going to have to go far beyond the world of structured data, which relies mainly on spreadsheet-like tables with one-to-many relationships. For example: one client (parent table) with many (child table) invoices, or one (parent table) invoice with many (child table) items linked together by unique identifiers.)

But that is structured data, set up to be structured. Arrays are another way of handling child data—but it remains to be seen just how they will be used with text data, which is unstructured.

No one knows what the job description will be for this new area. It sounds to me as if the Big Data discipline is at about the same place that computer science was back in the 1960s, when COBOL was invented. One company announced it was hiring, and looking for people with strong math, physics, and database knowledge.

Good solutions have yet to be developed. One speaker described today’s choices for a solution as being between “bad and bad.”

The Web was discussed as a good place for gathering data about trends and outbreaks. The challenge is in analyzing unstructured data. Much of it is text based. It needs to be looked at in its context. An example given was the use of the word “pretty.” Pretty good is very different from pretty bad.

The need for translation from and into Arabic, Chinese, and other non–Roman alphabet languages complicates text analysis. Knowing the cultural context is another challenge. Again, I say—oh, my!

In my early days as a database developer I set up quite a few businesses with databases tracking name, address, and phone numbers. Now out-of-the box applications like Outlook are easy to come by. Web-based databases like Salesforce.com, designed to meet the needs of salespeople, have also evolved.

It is expected that focused applications like Outlook and Salesforce will emerge in the Big Data area. Some possibilities that were mentioned include developing business outcomes for tracking outbreaks of disease, identifying attempts at money laundering, and identifying students at risk of dropping out of school.

It was also mentioned that identification of problems will be the easier part. Figuring out how to solve them is much harder. Having spent 15 years as a junior/senior high school teacher, I would agree that figuring out which students are dropout risks should be easy. Figuring out how to keep them in school is a much harder problem.

Collection of device data on such items as lights turning on and off is now easy to do. People are now able to turn their home heat up or down remotely from their cellphones. The interconnection of devices has great possibilities.

A lot is already being done with GPS tracking. Big Data on traffic density is starting to be used to help people plan their route from home to work or work to their next appointment.

Because of different needs and uses of Big Data, there is not going to be one single approach. The problems yet to be solved are how long it takes to process the data, what data should be extracted, and how it should be presented. Right now no one knows how to make all this happen. Oh, my.

{ 0 comments }

The Escape Key

by Jackie on May 23, 2011

Many computer users have come to know and love the “Undo” feature which allows one to reverse changes in a word processing document or a spreadsheet.

A lesser-known fix for computer users is the Esc key – it is in the upper left corner of the keyboard.  While working with an experienced computer user I was surprised to uncover that she did not know about the Esc key.

She had requested an “undo” feature when filling in a field on an Access database I had developed. Pressing the Esc key wile in a field in Access will restore the field’s contents. That feature works in other places as well – including some fill-out forms on the web.

Another place the Esc key is useful is forgetting rid on an unwanted pop-up dialog box. A single press of the Esc key and the dialog box is gone.

The Esc key – next time you are in computer trouble give it a try.

{ 1 comment }

Keeping an old database up-to-date

by Jackie on May 23, 2011

Why would you want to make changes to a database program that had been customized for you 10 or more years ago? How about this?

  1. You moved your office and reports going to clients needed to have your new address on them.
  2. You want to keep up with today’s means of communication.
  3. Your systems department demanded the software be upgraded to the latest version of Access (2010)

These are changes I have made on three different client programs during the past few months. The program itself was working just fine, but these new situations emerged.

In the second instance, the program was producing a report showing a person’s name, their street address, home phone and work phone. The new report, meeting today’s needs eliminates the home and work phone numbers and substitutes an email address, a cell phone number, and an other phone number list – which could include home and work phone numbers.

Times change and software gets upgraded. Microsoft did a good job on keeping the newer versions of Access (2007 and 2010) working consistently with older versions of the program. In years past, I have found that if a program contains code it is a good idea to make sure the code is compiled before converting it so that the converted program works right.

With the new ribbon interface, it may be necessary to tweak the interface by minimizing the ribbon and the navigation pane to make the screen appear as it did in earlier versions.

Change – it keeps happening.

 

{ 0 comments }