Before explaining a federation concept we must know what are NameNodes, DataNodes and their functionality.
HDFS follows master- slave architecture. HDFS cluster consists of a single Master called as NameNode. A master server manages the file system namespace and contains metadata information. Metadata information means mapping of data blocks to DataNodes anywhere in Rack.
In addition, there are a number of slaves which are DataNodes. These DataNodes are one per node in the cluster, which manage storage. When a file is stored in HDFS , it splits into one or more blocks and these blocks are stored in a different DataNodes. A typical Data block size used by HDFS is 128 MB. Thus, in HDFS file is divided into 128 MB chunks and each chunk generally resides on a different DataNode.
Thus If I am storing 1TB of file , How much Datablocks it will get divided?
Answer is (1024 * 1024) Mb/128 = 8192 blocks and replication factor is 3 therefore (3 * 8192= 24576) blocks
The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.
As NameNode also has some capacity to store and may face problem of vertical scaling. In a cluster many nodes can be added to overcome problem of vertical scaling, same solution is applicable at NameNode also. This Horizontal scaling of NameNode is called as Federation.
For more info about federation refer below link:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html
For more Details about rack concept refer below link:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/RackAwareness.html
Author Profile
- Passionate traveller,Reviewer of restaurants and bars,tech lover,everything about data processing,analyzing,SQL,PLSQL,pig,hive,zookeeper,mahout,kafka,neo4j
Latest Post by this Author
- PLSQLApril 26, 2020How effectively we can use temporary tables in Oracle?
- Big DataAugust 15, 2019How to analyze hadoop cluster?
- Big DataJuly 28, 2019How to setup Hadoop cluster using cloudera vm?
- Big DataMay 25, 2019How to configure parameters in Hadoop cluster?