Reading and Writing Files in Hadoop
Failover and Fencing: are 2 very important properties of HDFS, which aims to provide an overall efficiency of the eco-system.
The transition from the active namenode to the standby is managed by a new entity, known as the ‘failover controller’. There are various failover controllers, but the default implementation uses ZooKeeper, to ensure that only one namenode is active. Failover may also be initiated manually as for routine maintenance. This is known as a graceful failover, since the failover controller arranges an orderly transition for both active and standby namenodes to switch roles. The ‘High Availability’ implementation goes to great lengths to ensure that the previously active namenode is prevented from doing any damage and causing corruption, as we already know that only one NameNode can be active in the cluster at any given time. If more than one NameNode were active it could lead to inconsistencies or corruption of the filesystem. Hence, comes in Fencing. Now fencing is a technique employed by the Quorumn journal manager, QJM in short, to ensure that clients are not able to access a NameNode that is in its standby mode.
- The HDFS Client requests to write a file block.
- The NameNode responds to the Client with the DataNode on which write operation can be performed.
- The Client requests to write the block to the specified DataNode.
- The DataNode opens a block replication pipeline with another DataNode in the cluster, and this process continues until all configured replicas are written.
- A write acknowledgment is sent back through the replication pipeline.
- The Client is informed that the write operation was successful
Step 1. The HDFS Client requests to read a file.
Step 2. The NameNode responds to the request with a list of DataNodes containing the blocks that comprise the file.
Step 3. The Client communicates directly with the DataNodes to retrieve blocks for the file.