What is Big Data? Why it become a hot topic in IT industry?

The term Big data refers to any kind of data that is in large volume or huge amount. It may be audio files,video files,documents,xls,pdfs .. and any kind from which we can get information.

How we will Identify whether data is big or small or is there any limit set for it? 

Answer is No. There is no any starting limit for a data to consider as a big and also there has no end limit.

Consider following real life scenario where  payout business of a person X who is processing 20GB of data files for B1 bank on daily basis. Bank B1 is giving commission to person X after successful loan recovery from defaulter customers.  At the end of 2 years suppose a customer of B1 bank applies for loan from another bank B2 . In this case B2 bank will verify past 2 years  record of customer from B1.

So according to existing database system bank B1 is fetching data of this customer from table which has size of more than 15 Terabytes. Bank B1’s existing system has transfer speed of 100 Mb/s data over a network = (15 * 1024 * 1024)/100 sec = 157286.4 sec= 2621.44 hrs =109 days.

It’s not possible to process and extract data in this way. The power of RDBMS will fail in this situation. These are the use cases where Big data frameworks will come into picture.

So oracle will definitely perform well when table size is in GB’s.

Any business who is facing problem of vertical scaling i.e increasing data drastically and processing out of systems capacity is candidate to use big data systems.

What are different sources in day to day life of big data?

Facebook, Whatsapp, Instagram ,stock exchanges are common examples who is processing  huge amount of data. NYSE generates daily 7TB of data.

What is structured, unstructured, semi structured data?

When data is following specific structure according to its table then its a structured data. Inserting data after validating constraints and datatype is always structured.

When data is not following any validations, constraints at the time of writing to the table is unstructured.

When data is following specific structure but individual components are unstructured is semi structured data. eg email has following structure-  To, Cc ,Bcc, Subject, Attachment. Subject and attachment doesn’t have specific structure. Attachment can be audio,video clip of document. so this is semi structured data. So We cannot say big data is always unstructured data.

RDBMS has validation at the time of writing data to file while big data frameworks has validation at the time of reading data from files.

Author Profile

Passionate traveller,Reviewer of restaurants and bars,tech lover,everything about data processing,analyzing,SQL,PLSQL,pig,hive,zookeeper,mahout,kafka,neo4j

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

What do you think?

400 Points
Upvote Downvote

Creating a Pool out Timer in Javascript / Typescript

What is federation in Hadoop?