|
HDFS High Availability
|
|
|
Why High Availability :
Without HA implemented on the HDFS NameNode, the NameNode is a single point of failure for the entire filesystem. A SecondaryNameNode, while helpful in reducing recovery times and providing an alternate storage location for the NameNode’s metadata, is not a high availability or hot standby solution. When can a NameNode become unavailable: - Unplanned Reasons: hardware or software failure. - Planned Reasons: Restart for software upgrade or configuration change. How is HA implemented at high level:
![]()
Impact of HA on Secondary NN:
The Standby NameNode performs the checkpointing
functions normally provided by the SecondaryNameNode, so in a HA
configuration the SecondaryNameNode is no longer required.
|
|
|
Fencing:
|
|
|
Types of Failover:
Failover of a NameNode, or changing the state of a
NameNode from standby to active, can be either
1)automatic (system-detected and initiated)
or 2)manual (user-initiated).
![]() |
|
|
Deploying HA:
![]() |
|
|
Following Shows examples of above
mentioned properties:
In core-site.xml we set
fs.defaultFS property to hdfs://testhacluster
Following property values are set in hdfs-site.xml
property>
<name>dfs.nameservices</name> <value>testhacluster</value> </property>
<property>
<name>dfs.ha.namenodes.testhacluster</name> <value>nn1,nn2</value> </property>
<property>
<name>dfs.namenode.rpc-address.testhacluster.nn1</name> <value>namenode1:8020</value> </property>
<property>
<name>dfs.namenode.rpc-address.testhacluster.nn2</name> <value>namenode2:8020</value> </property>
<property>
<name>dfs.namenode.http-address.testhacluster.nn1</name> <value>namenode1:50070</value> </property>
<property>
<name>dfs.namenode.http-address.testhacluster.nn2</name> <value>namenode2:50070</value> </property>
<property>
<name>dfs.namenode.shared.edits.dir</name> <value>qjournal://journalnode1:8485/testhacluster</value> </property>
<property>
<name>dfs.journalnode.edits.dir</name> <value>/tmp/dfs/jn</value> </property>
<property>
<name>dfs.client.failover.proxy.provider.testhacluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.Configured FailoverProxyProvider</value> </property>
<property>
<name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property>
<property>
<name>dfs.ha.automatic-failover.enabled</name> <value>false</value> </property |
Hadoop Notes
Sunday, 5 May 2019
HDFS High Availability
Wednesday, 24 April 2019
HDFS Rack Awareness
|
HDFS Rack Awareness
|
||
|
Configuration files for hadoop eco-system components
|
Configuration files for hadoop eco-system components
|
|
core-site.xml and hdfs-site.xml Imp properties
HDFS Configuration
Parameters(Detailed):
Common Properties
(core-site.xml)
|
|
|
hdfs-site.xml
|
|
dfs.namenode.name.dir and
dfs.namenode.edits.dir
Note that there are no spaces between the
comma-delimited values. Example value is /opt/app/daya01/hdfs/nn
|
|
dfs.namenode.checkpoint.dir/period/txns
There are three significant configuration
properties that relate to the checkpointing function.
|
|
dfs.datanode.data.dir |
|
dfs.datanode.du.reserved |
|
dfs.blocksize
|
|
dfs.replication
|
|
Examples of above Properties:
|
|
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/app/data01/hdfs/nn</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/opt/app/data01/hdfs/snn</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>${dfs.namenode.checkpoint.dir}</value>
</property>
<property>
<name>dfs.namenode.checkpoint.period</name>
<value>21600</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>10000000</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/data/data01/hdfs/dn,/opt/data/data02/hdfs/dn,/opt/data/data03/hdfs/dn,/opt/data/data04/hdfs/dn,/opt/data/data05/hdfs/dn,/opt/data/data06/hdfs/dn,/opt/data/data07/hdfs/dn,/opt/data/data08/hdfs/dn,/opt/data/data09/hdfs/dn,/opt/data/data10/hdfs/dn,/opt/data/data11/hdfs/dn,/opt/data/data12/hdfs/dn</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>1073741824</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.replication.max</name>
<value>50</value>
</property>
|
Other Hadoop Environment Scripts and Configuration Files
Other Hadoop Environment Scripts
and Configuration Files
We are familiar with many
configuration files like core-site.xml,yarn-site.xml,mapred-site.xml,and
hdfs-site.xml. Apart from these there are some more configuration files in the
configuration directory.
[sukul@server1 ~]$ cd
$HADOOP_HOME/conf
[sukul@server1 conf]$ ls *xml
capacity-scheduler.xml hadoop-policy.xml mapred-site.xml ssl-server.xml
core-site.xml hdfs-site.xml ssl-client.xml yarn-site.xml
[sukul@server1 conf]$ ls *sh
hadoop-env.sh mapred-env.sh
yarn-env.sh
A]
hadoop-env.sh/yarn-env.sh/mapred-env.sh
- The hadoop-env.sh is used to source environment variables for Hadoop daemons and processes.
- This can include daemon JVM settings such as heap size or Java options, as well as basic variables required by many processes such as HADOOP_LOG_DIRorJAVA_HOME (Following shows just few lines of the hadoop-env.sh script)
|
[sukul@server1
conf]$ cat hadoop-env.sh | grep -v
'^#' | sed '/^$/d'
export
JAVA_HOME=/opt/app/java/jdk/jdk180/
export HADOOP_HOME_WARN_SUPPRESS=1
export
HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/2.6.5.4-1/hadoop}
export
JSVC_HOME=/usr/lib/bigtop-utils
export
HADOOP_HEAPSIZE="4096"
export
HADOOP_NAMENODE_INIT_HEAPSIZE="-Xms233472m"
export
HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}"
export
HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:ErrorFile=/opt/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=25600m
-XX:MaxNewSize=25600m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=256m
-Xloggc:/opt/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly
-Xms233472m -Xmx233472m -Dhadoop.security.logger=INFO,DRFAS
-Dhdfs.audit.logger=INFO,RFAAUDIT ${HADOOP_NAMENODE_OPTS}"
|
- Basically, if you need to pass any environment variables to any Hadoop process, thehadoop-env.shfile is the file to do this in, as it is sourced by all Hadoop control scripts.
- Similarly, there may be other environment shell scripts such as yarn-env.shandmapred-env.sh that are used by these specific processes to source necessary environment variables. (Following shows just few lines of the mapred-env.sh and yarn-env.sh scripts)
|
[sukul@server1
conf]$ cat mapred-env.sh | grep -v
'^#' | sed '/^$/d'
export
HADOOP_JOB_HISTORYSERVER_HEAPSIZE=16384
export
HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
export
HADOOP_JOB_HISTORYSERVER_OPTS=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:/opt/log/hadoop-mapreduce/mapred/gc_trace.log
-XX:ErrorFile=/opt/log/hadoop-mapreduce/mapred/java_error.log
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/log/hadoop-mapreduce/mapred/heap_dump.hprof
export
HADOOP_OPTS="-Dhdp.version=$HDP_VERSION $HADOOP_OPTS"
|
|
[sukul@server1
conf]$ cat yarn-env.sh | grep -v '^#'
| sed '/^$/d'
export
HADOOP_YARN_HOME=/usr/hdp/2.6.5.4-1/hadoop-yarn
export
YARN_LOG_DIR=/opt/log/hadoop-yarn/$USER
export
YARN_PID_DIR=/var/run/hadoop-yarn/$USER
export
HADOOP_LIBEXEC_DIR=/usr/hdp/2.6.5.4-1/hadoop/libexec
export
JAVA_HOME=/opt/app/java/jdk/jdk180/
export
HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
export
YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"
if [ "$JAVA_HOME" !=
"" ]; then
#echo "run java in $JAVA_HOME"
JAVA_HOME=$JAVA_HOME
fi
if [ "$JAVA_HOME" =
"" ]; then
echo "Error: JAVA_HOME is not set."
exit 1
fi
|
B] log4j.properties
- Hadoop usesLog4J(the Java logging framework) to store and manage its log files. Log files are produced by nearly every process in Hadoop, including daemons, applications, and tasks.
- Thelog4j.propertiesfile provides configuration for log file management, including how to write log records, where to write them, and how to manage rotation and retention of log files.
- Following shows sample log4j.properties file:
|
[sukul@server1 ~]$ cd $HADOOP_HOME/conf
[sukul@server1 conf]$ ls
log4j.properties
log4j.properties
[sukul@server1 conf]$ cat
log4j.properties | grep -v '^#' | sed '/^$/d'
hadoop.root.logger=INFO,console
hadoop.log.dir=.
hadoop.log.file=hadoop.log
log4j.rootLogger=${hadoop.root.logger},
EventCounter
log4j.threshhold=ALL
log4j.appender.RFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.RFA.DatePattern=.yyyy-MM-dd
log4j.appender.RFA.MaxBackupIndex=45
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601}
%p %c: %m%n
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{2}: %m%n
hadoop.tasklog.taskid=null
hadoop.tasklog.iscleanup=false
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601}
%p %c: %m%n
hadoop.security.logger=INFO,console
hadoop.security.log.maxfilesize=256MB
hadoop.security.log.maxbackupindex=20
log4j.category.SecurityLogger=WARN,console
hadoop.security.log.file=SecurityAuth.audit
log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601}
%p %c: %m%n
log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601}
%p %c: %m%n
log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
hdfs.audit.logger=INFO,console
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601}
%p %c{2}: %m%n
log4j.appender.RFAAUDIT.MaxBackupIndex=180
log4j.appender.RFAAUDIT.MaxFileSize=16106127360
mapred.audit.logger=INFO,console
log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
log4j.appender.MRAUDIT=org.apache.log4j.DailyRollingFileAppender
log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601}
%p %c{2}: %m%n
log4j.appender.MRAUDIT.DatePattern=.yyyy-MM-dd
hadoop.metrics.log.level=INFO
log4j.logger.org.apache.hadoop.metrics2=${hadoop.metrics.log.level}
log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
log4j.logger.BlockStateChange=ERROR
log4j.logger.org.apache.hadoop.hdfs.StateChange=WARN
yarn.log.dir=.
hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
log4j.appender.JSA=org.apache.log4j.DailyRollingFileAppender
yarn.server.resourcemanager.appsummary.log.file=hadoop-mapreduce.jobsummary.log
yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
log4j.appender.RMSUMMARY.File=${yarn.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
log4j.appender.RMSUMMARY.MaxFileSize=256MB
log4j.appender.RMSUMMARY.MaxBackupIndex=20
log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601}
%p %c{2}: %m%n
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{2}: %m%n
log4j.appender.JSA.DatePattern=.yyyy-MM-dd
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
|
- In some cases different components may have their own specificlog4j.propertiesfile, which may be located in the Hadoop configuration directory, such askms-log4j.propertiesandhttpfs-log4j.properties.
C]
hadoop-metrics.properties
- We may also have hadoop-metrics.properties and/or hadoop-metrics2.propertiesfiles in your Hadoop configuration directory. These are used to define application and platform metrics to collect. Following are the sample hadoop-metrics.properties
|
[sukul@server1 conf]$ cat
hadoop-metrics2.properties | grep -v '^#' | sed '/^$/d'
*.period=10
*.sink.timeline.plugin.urls=file:///usr/lib/ambari-metrics-hadoop-sink/ambari-metrics-hadoop-sink.jar
*.sink.timeline.class=org.apache.hadoop.metrics2.sink.timeline.HadoopTimelineMetricsSink
*.sink.timeline.period=10
*.sink.timeline.sendInterval=60000
*.sink.timeline.slave.host.name=serv084.zbc.xyz.com
*.sink.timeline.zookeeper.quorum=serv269.zbc.xyz.com:2181,serv271.zbc.xyz.com:2181,serv267.zbc.xyz.com:2181
*.sink.timeline.protocol=http
*.sink.timeline.port=6188
*.sink.timeline.truststore.path =
/etc/security/clientKeys/all.jks
*.sink.timeline.truststore.type =
jks
*.sink.timeline.truststore.password
= bigdata
datanode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
namenode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
resourcemanager.sink.timeline.collector.hosts=serv287.zbc.xyz.com
nodemanager.sink.timeline.collector.hosts=serv287.zbc.xyz.com
jobhistoryserver.sink.timeline.collector.hosts=serv287.zbc.xyz.com
journalnode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
maptask.sink.timeline.collector.hosts=serv287.zbc.xyz.com
reducetask.sink.timeline.collector.hosts=serv287.zbc.xyz.com
applicationhistoryserver.sink.timeline.collector.hosts=serv287.zbc.xyz.com
resourcemanager.sink.timeline.tagsForPrefix.yarn=Queue
|
D] Other
Configuration Files:
- Slaves files: Used by the cluster startup scripts in the Hadoop sbin directory. This contains list of slave nodes.
- hadoop-policy.xml,kms-site.xml, orssl-server.xml: configuration files related to security or access control policies, SSL configuration, or key management
Subscribe to:
Comments (Atom)


