Hadoop Notes: HDFS High Availability

HDFS High Availability

Why High Availability :
Without HA implemented on the HDFS NameNode, the NameNode is a single point of failure for the entire filesystem. A SecondaryNameNode, while helpful in reducing recovery times and providing an alternate storage location for the NameNode’s metadata, is not a high availability or hot standby solution.

When can a NameNode become unavailable:
- Unplanned Reasons: hardware or software failure.
- Planned Reasons: Restart for software upgrade or configuration change.

How is HA implemented at high level:

Using NameNode high availability (HA), two NameNodes are deployed managing the same HDFS namespace: one active and one standby NameNode. The Standby NameNode takes control of the filesystem, managing client read-and-write requests and metadata updates, if the active NameNode fails.
Clients only ever connect to the active NameNode.
The active NameNode writes its metadata to a quorum of JournalNodes, managed by the QuorumJournalManager (QJM) process, which is built into the NameNode. Quorums are deployed in odd numbers greater than one (3 or 5 for example). Metadata updates (edits) written to a majority of JournalNodes (a quorum) are considered to be consistent. The Standby NameNode reads consistent updates from the JournalNodes to apply changes to its in-memory metadata representation.
DataNodes in the cluster send their heartbeats and block reports to both NameNodes using an abstraction called a nameservice. The nameservice is the same method used by clients to connect to the “active” NameNode.
Using HA, the Standby NameNode is transactionally consistent with the active NameNode, so a failure and subsequent failover will not result in a loss of data or significant disruption.

Impact of HA on Secondary NN:

The Standby NameNode performs the checkpointing functions normally provided by the SecondaryNameNode, so in a HA configuration the SecondaryNameNode is no longer required.

Fencing:

Clients connect to an active NameNode in a high availability HDFS cluster. Only one NameNode can be active in the cluster at any given time. If more than one NameNode were active it could lead to inconsistencies or corruption of the filesystem.

Fencing is a technique employed by the QJM to ensure that clients are not able to access a NameNode that is in standby mode due to a failover. Fencing isolates a NameNode, preventing further requests from being accepted.

Following are the fencing methods available:

Sshfence: Connects to the active NameNode using SSH, and uses fuser to kill the process listening on the service’s TCP port
Shell: Runs an arbitrary shell command to fence the Active NameNode

Fencing method are implemented using the dfs.ha.fencing.methods property in the hdfs-site.xml configuration file.

<property>

          <name>dfs.ha.fencing.methods</name>

          <value>sshfence</value>

        </property>

The QJM does not allow the active NameNode to take further metadata update requests until the previously active NameNode has been fenced.

Types of Failover:

Failover of a NameNode, or changing the state of a NameNode from standby to active, can be either 1)automatic (system-detected and initiated) or 2)manual (user-initiated).

Manual Failover: Manual failover can be accomplished using the hdfs haadmin command
hdfs haadmin -failover nn1 nn2

Automatic Failover: Automatic failover is implemented using an external service called the ZooKeeper Failover Controller(ZKFC) . The ZKFC process runs on each NameNode and uses a quorum of ZooKeeper nodes to maintain the state of the active node and automatically initiate a failover in the event of unavailability of the active NameNode.

Deploying HA:

Deploying NN HA is a disruptive exercise as you will need to stop and reconfigure HDFS services.

First and foremost , we need a second host to act in the NameNode HA pair. This host will ideally have the same configuration and specifications as the existing NameNode as it will need to perform all of the functions of the primary NameNode.

NameService: A nameservice provides an abstraction for cluster clients and services to connect to one of the two available NameNodes. This is done through configuration, whereby the fs.defaultFS refers to a nameservice ID as opposed to a NameNode host. Nameservices available are defined in the dfs.nameservices property in the hdfs-site.xml file.
References to the hosts involved in the particular nameservice are configured using the dfs.ha.namenodes.<nameservice_id> property in the hdfs-site.xml configuration file.

The hosts themselves are then defined in dfs.namenode.rpc-address and dfs.namenode.http-address properties, which provide specific connection details to the RPC, and web services respectively for each NameNode.

fs.defauItFS
hdfs : / /prodcluster
dfs . namenode. rpc-address .prodcluster .nnl ->
namenodel : 8020
dfs. nameservices
prodcluster
dfs . ha. namenodes . prodcluster —Y
nnl , nn2
dfs . namenode . rpc-address . prodcluster . nn2 -s
namenode2 : 8020
dfs . namenode. http—address . prodcluster. nn2
namenode2 : 50070
dfs . namenode. http-address.prodcluster . nnl
namenodel : 50070

Following Shows examples of above mentioned properties:

In core-site.xml we set fs.defaultFS property to hdfs://testhacluster

Following property values are set in hdfs-site.xml

property>

      <name>dfs.nameservices</name>

      <value>testhacluster</value>

    </property>

<property>

      <name>dfs.ha.namenodes.testhacluster</name>

      <value>nn1,nn2</value>

    </property>

<property>

    <name>dfs.namenode.rpc-address.testhacluster.nn1</name>

      <value>namenode1:8020</value>

    </property>

<property>

    <name>dfs.namenode.rpc-address.testhacluster.nn2</name>

      <value>namenode2:8020</value>

    </property>

<property>

    <name>dfs.namenode.http-address.testhacluster.nn1</name>

      <value>namenode1:50070</value>

    </property>

<property>

    <name>dfs.namenode.http-address.testhacluster.nn2</name>

      <value>namenode2:50070</value>

    </property>

<property>

      <name>dfs.namenode.shared.edits.dir</name>

    <value>qjournal://journalnode1:8485/testhacluster</value>

    </property>

<property>

      <name>dfs.journalnode.edits.dir</name>

      <value>/tmp/dfs/jn</value>

    </property>

<property>

    <name>dfs.client.failover.proxy.provider.testhacluster</name>

    <value>org.apache.hadoop.hdfs.server.namenode.ha.Configured
  FailoverProxyProvider</value>

    </property>

<property>

      <name>dfs.ha.fencing.methods</name>

      <value>shell(/bin/true)</value>

    </property>

<property>

    <name>dfs.ha.automatic-failover.enabled</name>

      <value>false</value>

    </property

Hadoop Notes

Sunday, 5 May 2019

HDFS High Availability

No comments:

Post a Comment