V’s of Big Data
Big data based on the word itself means data which is big but how much big is considered as big? For a small business, big data might be few gig while Facebook processes 100s of Tb data every day [6]. So, the definition of big data will slightly differ from organization to organization. Few years back big data meant the huge size of data which is beyond the ability of normal software to process and handle it. Big data was defined using the 3 V’s : Volume, Velocity and Variety but now that has changed and now four more V’s were added which consists of [1] :
Variability: This means the data does not have same meaning every day and is constantly changing
Veracity: Making sure the data we have is accurate
Visualization: Visualizing the data so it can be easily interpreted by readers
Value: The final goal is to get some value out of data
University of California, Berkley asked forty thought leaders in data science field about the definition of Big Data. John Akred the founder and CTO of Silicon Valley Data Science described big data as, “Big Data refers to a combination of an approach to informing decision making with analytical insight derived from data, and a set of enabling technologies that enable that insight to be economically derived from at times very large, diverse sources of data” [5].While David Leonhardt who is an editor for The New York Times defines big data as, “Big Data is nothing more than a tool for capturing reality – just as newspaper reporting, photography and long-form journalism are” [5]. As we can see different people from different industry has different definition of Big Data but one thing in common is that it follows the basic three V’s defined before and the end product is different.
Microsoft defines big data as patterns which can be used with proven practices for predictable results. Further they defined big data as “Data often produced at fire hose rate, that you do not know how to analyze at the moment but may provide valuable information in future” [4]. Then they ask the readers if they know what the visitors to their website are really thinking? Or if they are business owners and if they know what the customers think of their products? They ask these questions and then tell them that the answer is hidden in the pile of data which is hidden away and if you can find it still it might be difficult to process it and get meaningful insights out of it. Microsoft then further defines big data as –
“Big data typically refers to collections of datasets that, due to size and complexity, are difficult to store, query, and manage using existing data management tools or data processing applications.”
We notice here that they are trying to focus on the difficulty of storing, querying and managing data and can expect a solution or product for it. So basically, Microsoft is trying to connect with their readers by asking them if they have a big data problem and then providing an example of what techniques data analysts and business managers are following and how Microsoft can help them. Then they describe Big Data with the text book definition using the three V’s and links the readers to their product called HDInsight [4]. HDInsight is a Hadoop- based solution which Microsoft offers. It is an all in one solution which can store data, process data and executing data analysis.
Tableau is one of the biggest company and provider of visualization techniques which can help companies visualize data from any source. They define big data as “structured or unstructured, petabytes or terabytes, millions or billions of rows, you can turn data into big ideas“[1]. They try to focus on the visualization part as Tableau is used discover and understand their data. So, if we compare it to Microsoft definition they are missing the part about storing the data, retrieving it and the problems behinds it. While explaining the meaning of big data tableau keeps pointing out the importance of visualizing the data regardless of the size of data and how organizations can leverage this to their benefit. They also iterate that storing, preparing and iterating data is costly and Tableau’s vision is to help companies apply best practices to get the most out of their data.
SAS uses the three V’s of big data (Volume, Velocity and Variety) to define it and say it’s not the amount of data that is important but what to do with it and how to use it to make better decisions. SAS adds two more dimensions Variability and Complexity to its definition of big data. They define variability as inconsistency in the amount of data that flows which cannot be controlled [3]. Complexity is defined as different types of data coming from different sources and connecting and combining it to get meaningful insights out of it.
When we compare, the definitions provided by these companies it is obvious the core definition of big data remains the same but then these companies tailor tit according to the products they offer. Microsoft which provides end to end solution for big data talks about storing of data which is skipped by Tableau as it provides solution for visualizing the data and not storing. So, we can conclude that there is no authority on the exact meaning of “Big Data” and its solutions.
Works Cited
[1] “Understanding Big Data: The Seven V’s.” Dataconomy. July 23, 2014. Accessed January 30, 2017. .
[2] “Big Data.” Tableau Software. Accessed January 30, 2017. .
[3] “What is Big Data and why it matters.” What Is Big Data? | SAS US. Accessed January 30, 2017. .
[4] “What is big data?” Accessed January 30, 2017. .
[5] “What Is Big Data?” What Is Big Data? – Blog. September 03, 2014. Accessed January 30, 2017. .
[6] “Data size estimates.” Follow the Data. June 24, 2014. Accessed January 30, 2017. https://followthedata.wordpress.com/2014/06/24/data-size-estimates/.