Interview with Kim Gregers Petersen, Big Data & Analytics – IBM, ATEA.

Big Data: Redefining the role of the consultant

Interview with Kim Gregers Petersen, Big Data & Analytics – IBM, ATEA

Big Data is on everyone's lips nowadays as a collection of technologies that can change the use of data the world over and in almost all types of businesses. Challenges, say some people. OPPORTUNITIES, says a Danish expert, and identifies the IT consultant as a key figure in a Big Data environment.

Recently there was a job ad at SKAT (the Danish tax authority). SKAT was looking for an assistant director to spearhead a brand-new department dedicated to 'Business Intelligence and Analysis'. The new department would, among other things, contri-bute to SKAT’s overall efficiency and further develop data models "where the use of Big Data will be a natural part of activities".

“I read the advertisement as a sign that not only companies, but also the government, have begun to work seriously with Big Data," says Kim Gregers Petersen, Big Data & Analytics expert at Atea. As a consultant within Big Data solutions, he notices interest increasing almost by the day, b ecause companies and organisations like SKAT are constantly seeing new opportunities to analyse their growing piles of data. "This is an area of explosive growth, and it's all about getting on board right now," says Kim Gregers Petersen, referring to the role of the IT consultant in the new Big Data ecosystem.

By way of introduction, and as a starting point for a discussion of Big Data, Kim Gregers Petersen sums up developments in this area over the last 10 years with four facts: Fact 1. The world’s data volumes are increasing at a pace that far exceeds our wildest dreams. Fact 2. The world’s growing data volume is not just a quantitative challenge, since the data originates from new sources, such as video, photography, audio, navigation systems and instant messaging. Fact 3. The new types of data are often unstructured and therefore require very different handling technologies to those we are accustomed to. Fact 4. These new technologies are still so new that IT consultants find themselves standing at a crossroads. On the one hand, he or she knows that it is the range of these new technologies will shape his or her professional future. But on the other hand, the consultant also knows that he or she does not know enough about the technologies to stay abreast of the expertise required in a specific area.

"In rough terms, this is how things look right now for many consultants," says Kim Gregers Petersen. "Of course, the question is: What is to be done?", he adds.

Highly interesting for the business

We will get back to the answer to that question. First, Kim Gregers Petersen explains what he defines as Big Data.

"If we take a hypothetical example, a business has data corresponding to 100 %. If you ask the vast majority of companies how much of the data they use in their daily business, they will answer 15-20 %. The remaining 80-85 % of the data is not used, for various reasons. They just store the data, because they have to, or because they do not know how to use it. The whole point of Big Data is to activate as much as possible of the 80-85 % inactive data, so it can contribute to the business," says Kim Gregers Petersen, giving an example.:

“Let's take a business that sells computers. The sales department keeps good track of which computers they sell to which types of customers, their profit margins on the various computers and the price development in the various product categories, etc. In the marketing department, they are good at contacting new and existing customers with offers of promotions, seminars, etc. And in customer service they are good at helping angry customers who call in and complain about a particular product. The point is that the data gathered by the various departments is never combined. It might be interesting for marketing and sales to know that customer service has handled 78 complaints about the same computer within a week. Today, that information is lost, because businesses don't have the systems to coordinate this data."

Big Data is screaming for manpower

Back to the changing role of the consultant in a Big Data environment. If you take a positive view of the explosive development, primarily companies have reason to be concerned about the many new opportunities and technologies, because they do not employ people with the right skills. From a consultant's point of view, Big Data Big Data is an exciting world just waiting to be conquered.

"This field is screaming for manpower," says Kim Gregers Petersen. "If I were 20 again, I would hurry up and run in that direction. For many years, being a programmer hasn't been very popular, one reason being that ERP solutions and Exchange solutions have been given an elegant administration layer makes them relatively easy for ordinary IT people to handle. In other words, it's become a bit boring to 'just' be a programmer. But with all the new Big Data technologies – most of which come from the open source community – it's suddenly cool to be a programmer again. We do not see the super-hot interfaces in the new products that we know from mature technologies. Big Data is a bit more hardcore.”

As Kim Gregers Petersen explains, it is not yet possible to take the formal route if you wish to train in the field of Big Data, since this is not offered at Danish colleges and universities. "This is actually the biggest hurdle preventing the expansion of Big Data right now," says Kim Gregers Petersen. "But I suppose it's related to the fact that technology is so new that the educational system has not been able to keep up."

The Big Data environment

With a generic model of a Big Data environment in front of him, Kim Gregers Petersen outlines the long journey that the data takes, from the first knock on the company’s door, such as Twitter, video or telecommunications data, to its final appearance as e.g.
BI reports. During the journey, the name Hadoop pops up. According to Wikipedia's definition, Hadoop is 'an open-source software framework for the storage and large-scale processing of data in large clusters that run on commodity hardware'. Kim Gregers Petersen describes Hadoop as a key component of many of the largest Big Data environments in the world.

"The great thing about Hadoop is that it acts as an infinite number of buckets into which you can pour both structured and unstructured data. You may wish to analyse some of the data immediately, while other data may not be analysed until after three years, when this is more relevant. Hadoop was created to meet these and many other requirements," states Kim Gregers Petersen.

"If you are a consultant facing the choice of which path to take, I recommend that you take a closer look at Hadoop and the range of technologies that follow in Hadoop’s wake. I say this for several reasons, including that never before have large commercial enterprises had so much at stake in an open source environment. For example, Hadoop represents the backbone of the IT systems of Yahoo, Twitter, Netflix and Facebook, and they will do everything to ensure that Hadoop gets better and better."

He can barely bring himself to mention the case, because it has received so much media attention, but Kim Gregers Petersen mentions in passing Vestas' large Hadoop installation and how they are able to make almost real-time simulations for the location of new wind turbines, in order to demonstrate the potential of Big Data and Hadoop. In another, less known example, Sweden’s Royal Institute of Technology (KTH) – is
using IBM's streaming technology STREAMS for traffic monitoring in Stockholm. A variety of data sources, such as vehicles’ GPS signals, alarm messages from traffic control, sensors on the roads and weather data, help direct traffic to flow as smoothly as possible.

The logic is that, no matter which industry, any business of a certain size could benefit from Big Data?

“Exactly. But this requires creative thinking, and that you know the technologies. That's where the lack of consultants comes into the picture. We simply don't have enough consultants who know enough about these technologies," concludes Kim Gregers Petersen.

The consultant's five sure technology choices

5 Big Data technologies and tools that you can stake on as a consultant, according to Kim Gregers Petersen.

  1. Hadoop. Represents the core of many Big Data ecosystems.
  2. Java/Python/R. The three programming languages ??that make sense in Big Data environments. Choosing one over the others depends on the task
  3. Pig/Pig Latin. The tool you typically want to use for tasks such as ETL processes, research of raw data and iterative data processing
  4. Hive. Can be seen as the Hadoop system's data warehouse. Data in Hive is accessed with HiveSQL with SQL-like queries
  5. A NoSQL database of your own choice/task. For example, Hbase, Cassandra or MongoDB work in other places where you have faced similar issues. This is where sparring will prove its worth.

And finally, do not forget the practical Big Data packages offered by commercial vendors. Particularly IBM, as the market’s largest Big Data supplier, offers an extensive portfolio of Big Data products in which they have succeeded in bundling a large part of Big Data technology with very user-friendly front-end products.