How to analyze hadoop cluster?

This article is all about to find capacity of each component in a hadoop cluster.

Hit below command to check hdfs report 

[cloudera@quickstart bin]$ hdfs dfsadmin -report


Below are the parameters for my hadoop cluster

Configured Capacity: 58479091712 (54.46 GB)

It is the total capacity available to HDFS for storage.

Present Capacity: 45443014656 (42.32 GB)

It’s amount of space available for storing files after allocating space for metadata information.That means the difference of Configured Capacity and Present Capacity is used for storing file system metadata and other information

DFS Remaining: 44569501696 (41.51 GB)

It is the amount of storage space still available to the HDFS to store more files. If you have 90 GB remaining storage space, that mean you can still store up to 90/3 = 30 GB of files without exceeding your Configured Capacity and assuming replication factor is 3. So after understanding DFS Used and DFS Remaining we can say that:                                                          Here in my case Replication factor is only 1 so I can fully store 41.51 GB of data


DFS Used: 873512960 (833.05 MB)

It is the storage space that has been used up by HDFS. In order to get the actual size of the files stored in HDFS, divide the ‘DFS Used’ by the replication factor. 

Author Profile

Passionate traveller,Reviewer of restaurants and bars,tech lover,everything about data processing,analyzing,SQL,PLSQL,pig,hive,zookeeper,mahout,kafka,neo4j

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

What do you think?

400 Points
Upvote Downvote

How to setup Hadoop cluster using cloudera vm?

How do I access my SSH public key on mac?