What is federation in Hadoop?

Before explaining a federation concept we must know what are NameNodes, DataNodes and their functionality.

HDFS follows master- slave architecture. HDFS cluster consists of a single Master called as NameNode. A master server manages the file system namespace and contains metadata information. Metadata information means mapping of data blocks to DataNodes anywhere in Rack.

In addition, there are a number of slaves which are DataNodes. These DataNodes are one per node in the cluster, which manage storage. When a file is stored in HDFS , it splits into one or more blocks and these blocks are stored in a different DataNodes. A typical Data block size used by HDFS is 128 MB. Thus, in HDFS file is divided into 128 MB chunks and each chunk generally resides on a different DataNode.

Thus If I am storing 1TB of file , How much Datablocks it will get divided?

Answer is (1024 * 1024) Mb/128 = 8192 blocks and replication factor is 3 therefore (3 * 8192= 24576) blocks

The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

As NameNode also has some capacity to store and may face problem of vertical scaling. In a cluster many nodes can be added to overcome problem of vertical scaling, same solution is applicable at NameNode also. This Horizontal scaling of NameNode is called as Federation.

For more info about federation refer below link:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html

For more Details about rack concept refer below link:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html

Author Profile

Tejas

Passionate traveller,Reviewer of restaurants and bars,tech lover,everything about data processing,analyzing,SQL,PLSQL,pig,hive,zookeeper,mahout,kafka,neo4j