The MER Experience
The 25th Year of Delivering Premier Electronic Records Management Education

Dealing with Big Data: the ‘thin slice’ concept

By | Follow Justin on Twitter @JustinLovell | Follow memeburn on Twitter @memeburn

It has been said that “the only way to eat an elephant is one bite at a time”, and this rings true for Big Data. Just as applications like Google have become synonymous with the concept of searching for information on the internet through the use of a search engine, it’s not surprising that application frameworks such as Hadoop are becoming synonymous with the concept of Big Data (it may be no coincidence that the logo for Hadoop is a yellow elephant). Breaking down Big Data into thinly sliced concepts should help us understand its nature.

There is also a unique mental process that cannot be ignored with regards to how humans consume data. Malcolm Gladwell (@Malcgladwell) in his 2005 book Blink: The Power of Thinking Without Thinking defines the “theory of thin slices” as “How a little bit of knowledge goes a long way”, in allowing the individual to decide what is truly important. Gladwell poses the question “How is it possible to gather the necessary information for a sophisticated judgement in a short space of time?” The principle of Business Intelligence is to consolidate disparate data, big or small, distil it to a simple truth, so that a business user can consume it and make a decision whether it is operational or strategic in nature.

There are three main concepts you need to understand about Big Data:

1. Big Data is a new concept
Back in 2008, articles were being published regarding the challenge of growing data volumes and the ability we have to manage, consume and visualise it. According to the Mike 2.0 definition, “Big data can be very small and not all large datasets are big[…] Big then refers to big complexity rather than big volume” and this complexity has been challenging humanity from the day it started consuming and interpreting data.

2. Big Data decommissions the concept of a data warehouse
Just as the vendors claimed that in-memory analytics applications would negate the need for data warehouses, so to these same sales executives are suggesting the same regarding Big Data. Dr Ralph Kimball, an author on the subjects of data warehousing and Business Intelligence, put Big Data in context in his 2011 white paper The Evolving Role of the EDW in the Era of Big Data Analytics in saying, “big data is a paradigm shift in how we think about data assets”, and that now “Data is an asset on the balance sheet”.

He points out that “With the benefit of hindsight gained from the traditional data warehouse experience, the big data analytics version of data warehousing is likely to consolidate quite quickly. Only the bravest organisations with very strong software development skills should consider rolling their own big data analytics applications directly on raw MapReduce/Hadoop.”

3. Enter the Data Scientist
Interestingly the terms “Data Science” and “Data Scientist” have been emerging with the hype around Big Data. These terms imply that there are now new ways to understand and gain insight into data and that it is best left to the experts.

It’s worth noting the underlying principles overlap with the collection of data and its visualisation within the realm of the well-established Business Intelligence competency. However, these terms have been around since 2001, for example, Dr. William S. Cleveland’s article “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics”.

In his article, “The future belongs to the companies and people who turn data into products”, Mike Loukides (@mikeloukides) explains that “merely using data isn’t really what we mean by data science. Data science enables the creation of data products.

Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution”. In the past companies sought out the help of Business Intelligence consultants who would endeavour to gather from the business their requirements and the mappings of metrics to source data elements.

To a large extent, this will continue to be the case but now the area where a data scientist now can operate is where a company introduces a Big Data competency, starts collecting data but does not know the value it contains or at least has no requirement of how to define it, so they entrust the extraction of value over to the Data Scientist.

Delving into Big Data and its related concepts, we can extract value and insight for business and industry alike, prompting us to start asking the questions we never knew needed answers to.

This article originally appeared in memeburn, 08/22/2012

Justin Lovell

 Justin is a Principle Lead at Karabina Business Technology Solutions. He specialises in Big Data. He is known as “the integrator”. He’s been described as a great team payer, very skilled on all levels of design and implementation of scalable and reliable Business Intelligence solutions.

Connect: Twitter (@memeburn) | Linked In


Are you a blogger?

Please visit our Submissions page to find out how to submit an original article, or if you would like us to repost one of your best.

Bob Williams is the founder and president of Cohasset Associates, Inc. and founder of the National Conference on Managing Electronic Records, (MER).Mr. Williams is renowned for his leadership in addressing the legal, technical, and operational challenges associated with the life-cycle management of records, especially electronic records.  He has edited two definitive legal research studies: Legality of Microfilm and Legality of Optical Storage.  His also has overseen many industry white papers focused on compliance the SEC’s requirements for the storage of computer stored information (CSI).As a renowned speaker, Mr. Williams has given more than 1,000 presentations at briefings, seminars, and conferences throughout the United States as well as in Europe and South America.Mr. Williams’ primary focus now is organizing, running, and co-chairing the MER, Cohasset’s nationally renowned conference on managing electronic records