What is BIG DATA? What are the various challenges we face

Aditya N
3 min readSep 17, 2020

Lets see what is BIG DATA 🙄

The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. The act of accessing and storing large amounts of information for analytics has been around a long time.Data in the Big data is in the form of huge amount of Text,Video,Photo etc..

The key V’s in the world of Big Data are:

Volume: Organizations collect data from a variety of sources, including business transactions, smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it would have been a problem — but cheaper storage on platforms like data lakes and Hadoop have eased the burden.

Variety: Data comes in all types of formats — from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions.

Velocity: With the growth in the Internet of Things, data streams in to businesses at an unprecedented speed and must be handled in a timely manner. RFID tags, sensors and smart meters are driving the need to deal with these torrents of data in near-real time.

Veracity: Veracity refers to the quality of data. Because data comes from so many different sources, it’s difficult to link, match, cleanse and transform data across systems. Businesses need to connect and correlate relationships, hierarchies and multiple data linkages. Otherwise, their data can quickly spiral out of control.

Value: The bulk of Data having no Value is of no good to the company, unless you turn it into something useful. Data in itself is of no use or importance but it needs to be converted into something valuable to extract Information.

Variability: In addition to the increasing velocities and varieties of data, data flows are unpredictable — changing often and varying greatly. It’s challenging, but businesses need to know when something is trending in social media, and how to manage daily, seasonal and event-triggered peak data loads.

Types of Big Data

1️⃣ Structured Data : It refers to the data that has a proper structure associated with it. For example, the data that is present within the databases, the csv files, and the excel spreadsheets can be referred to as structured data.

2️⃣ Unstructured Data : It refers to the data that does not have any structure associated with it at all. For example, the image files, the audio files, and the video files can be referred to as unstructured data.

3️⃣ Semi-structured Data : It refers to the data that does not have a proper structure associated with it. For example, the data that is present within the emails, the log files, and the word documents can be referred to as Semi-Structured Data.

Tools used in harvesting Big Data are:

  1. Tableau
  2. Hadoop
  3. Splunk
  4. SAS Visual Analytics
  5. Talend
  6. Cassandra
  7. SiSence
  8. Spark
  9. Knime
  10. Mongodb

And many more …

What the real world Business usecases ? how are the tech giants are dealing with it Like Facebook,Google,Netflix,Amazon etc..?

Some of the Daily stats are:

  • 700,000 logins on facebook
  • Around 530,000 photos are shared on snapchat
  • Around 350000 tweets are tweeted on twitter
  • 30,000 photos are shared on Instagram
  • 21 million messages on WhatsApp

The data collected is mainly used in training the Machine learning Models, Face Recognition, Textual analysis , Targeting Advertisement

At the End DATA IS THE NEW OIL 😬

Thank you for reading!!

--

--