Sunday, 5 May 2019

HDFS High Availability


HDFS High Availability
Why High Availability :
Without HA implemented on the HDFS NameNode, the NameNode is a single point of failure for the entire filesystem. A SecondaryNameNode, while helpful in reducing recovery times and providing an alternate storage location for the NameNode’s metadata, is not a high availability or hot standby solution.

When can a NameNode become unavailable:
- Unplanned Reasons: hardware or software failure.
- Planned Reasons: Restart for  software upgrade or configuration change.

How is HA implemented at high level:
  • Using NameNode high availability (HA), two NameNodes are deployed managing the same HDFS namespace: one active and one standby NameNode. The Standby NameNode takes control of the filesystem, managing client read-and-write requests and metadata updates, if the active NameNode fails.
  • Clients only ever connect to the active NameNode.
  • The active NameNode writes its metadata to a quorum of JournalNodes, managed by the QuorumJournalManager (QJM) process, which is built into the NameNode. Quorums are deployed in odd numbers greater than one (3 or 5 for example). Metadata updates (edits) written to a majority of JournalNodes (a quorum) are considered to be consistent. The Standby NameNode reads consistent updates from the JournalNodes to apply changes to its in-memory metadata representation.
  • DataNodes in the cluster send their heartbeats and block reports to both NameNodes using an abstraction called a nameservice. The nameservice is the same method used by clients to connect to the “active” NameNode.
  • Using HA, the Standby NameNode is transactionally consistent with the active NameNode, so a failure and subsequent failover will not result in a loss of data or significant disruption.

image


Impact of HA on Secondary NN:
The Standby NameNode performs the checkpointing functions normally provided by the SecondaryNameNode, so in a HA configuration the SecondaryNameNode is no longer required.

Fencing:

 
  • Clients connect to an active NameNode in a high availability HDFS cluster. Only one NameNode can be active in the cluster at any given time. If more than one NameNode were active it could lead to inconsistencies or corruption of the filesystem.

  • Fencing is a technique employed by the QJM to ensure that clients are not able to access a NameNode that is in standby mode due to a failover. Fencing isolates a NameNode, preventing further requests from being accepted.

  • Following are the fencing methods available:
    • Sshfence: Connects to the active NameNode using SSH, and uses fuser to kill the process listening on the service’s TCP port
    • Shell: Runs an arbitrary shell command to fence the Active NameNode

  • Fencing method are implemented using the dfs.ha.fencing.methods property in the hdfs-site.xml configuration file.

<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>

  • The QJM does not allow the active NameNode to take further metadata update requests until the previously active NameNode has been fenced.
     

Types of Failover:
 
Failover of a NameNode, or changing the state of a NameNode from standby to active, can be either  1)automatic (system-detected and initiated) or 2)manual (user-initiated).
 
  • Manual Failover: Manual failover can be accomplished using the hdfs haadmin command
    hdfs haadmin -failover nn1 nn2

  • Automatic Failover: Automatic failover is implemented using an external service called the ZooKeeper Failover Controller(ZKFC) . The ZKFC process runs on each NameNode and uses a quorum of ZooKeeper nodes to maintain the state of the active node and automatically initiate a failover in the event of unavailability of the active NameNode.

image

Deploying HA:
 
  • Deploying NN HA is a disruptive exercise as you will need to stop and reconfigure HDFS services.

  • First and foremost , we need a second host to act in the NameNode HA pair. This host will ideally have the same configuration and specifications as the existing NameNode as it will need to perform all of the functions of the primary NameNode.

  • NameService:  A nameservice provides an abstraction for cluster clients and services to connect to one of the two available NameNodes. This is done through configuration, whereby the fs.defaultFS refers to a nameservice ID as opposed to a NameNode host. Nameservices available are defined in the dfs.nameservices property in the hdfs-site.xml file.
     
  • References to the hosts involved in the particular nameservice are configured using the dfs.ha.namenodes.<nameservice_id> property in the hdfs-site.xml configuration file. 

  • The hosts themselves are then defined in dfs.namenode.rpc-address and dfs.namenode.http-address properties, which provide specific connection details to the RPC, and web services respectively for each NameNode.

fs.defauItFS 
hdfs : / /prodcluster 
dfs . namenode. rpc-address .prodcluster .nnl -> 
namenodel : 8020 
dfs. nameservices 
prodcluster 
dfs . ha. namenodes . prodcluster —Y 
nnl , nn2 
dfs . namenode . rpc-address . prodcluster . nn2 -s 
namenode2 : 8020 
dfs . namenode. http—address . prodcluster. nn2 
namenode2 : 50070 
dfs . namenode. http-address.prodcluster . nnl 
namenodel : 50070


Following Shows examples of above mentioned properties:

In core-site.xml we set  fs.defaultFS property to hdfs://testhacluster

Following property values are set in hdfs-site.xml
property>
  <name>dfs.nameservices</name>
  <value>testhacluster</value>
</property>
<property>
  <name>dfs.ha.namenodes.testhacluster</name>
  <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.testhacluster.nn1</name>
  <value>namenode1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.testhacluster.nn2</name>
  <value>namenode2:8020</value>
</property>
<property>
  <name>dfs.namenode.http-address.testhacluster.nn1</name>
  <value>namenode1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.testhacluster.nn2</name>
  <value>namenode2:50070</value>
</property>
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://journalnode1:8485/testhacluster</value>
</property>
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/tmp/dfs/jn</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.testhacluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.Configured FailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>shell(/bin/true)</value>
</property>
<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>false</value>
</property

 

No comments:

Post a Comment