“Big Data” Just as the name itself implies ‘Big Data’ is huge amount data with complex structure and that grows exponentially with time. Such a data is so large and complex that none of the existing traditional database management tools are capable enough to store or process it efficiently. Now, this data can Structured, Semi-structured or even Unstructure.
To describe the phenomenon that is big data, people have been using the 5 Vs: Volume, Velocity, Variety, Veracity and Value.
1)Volume, Volume refers to the vast amounts of data generated every second. The size of data generated by humans, machines and their interactions on social media itself is massive. Just think of all the emails, twitter messages, photos, video clips, sensor data etc. we produce and share every second. We are not talking Terabytes but Zettabytes or Brontobytes. On Facebook alone we send 10 billion messages per day. With big data technology, such as hadoop, we can now store and use these data sets with the help of distributed systems, where parts of the data is stored in different locations and brought together by software.
2) Velocity: Velocity refers to the speed at which new data is being generated. This flow of data is massive and continuous. Just think of social media messages going viral in seconds, the speed at which credit card transactions are checked for fraudulent activities. Big data technology allows us now to perform real time analyzes of the data while it is being generated, without ever putting it into databases.
3) Variety: Variety refers to the different types of data we can now use. its not just restricted to structured data that neatly fits into tables or relational databases, such as financial data. In fact, 80% of the world’s data is now unstructured, and therefore can’t easily be put into tables (think of photos, video sequences or social media updates). With big data technology, such as HDFS – Hadoop distributed file system, we can now harness differed types of data (structured and unstructured) and bring them together.
4) Veracity: Veracity refers to the messiness or trustworthiness of the data. With many forms of big data, quality and accuracy are less controllable (for example just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech as well as the reliability and accuracy of content of each and every post) but big data and analytics technology now allows us to work with these type of data as well.
5) Value : The last V to take into account when looking at Big Data: Value! It is all well and good having access to big data but unless we can turn it into value it is useless. So you can safely argue that ‘value’ is the most important V of Big Data.
This century is known to be an era of Data. Anywhere and everywhere we turn today we see tools either generating or consuming huge amount of data each second. For Ex: mobile, internet, banking sector, hospitals etc. This marked a need to store and process data, so that we can generate meaningful insights or patterns to understand the trend. Hence came the advent of big data technologies.
Before we understand what is hadoop and its uses?, We have to first take a deep dive into issues related to Big Data and the existing traditional system.
Secondly, Storing heterogenous data. To add to the issues of Managing and Storing exponentially increasing data, comes heterogenous data –data coming in can be either – structured, semi-structured, and unstructured. So, one has to make sure that these varieties of data, generated from various sources are stored accordingly.
Thirdly Accessing and processing speed – we know that the hard disk capacity has been increasing and its size decreasing from its very inception, but the same is not true for its disk transfer speed or the access speed.
All these issues marked the need for a tool that could handle, process and store big data in an efficient manner. Hence, came in Hadoop.