Other Hadoop Environment Scripts
and Configuration Files
We are familiar with many
configuration files like core-site.xml,yarn-site.xml,mapred-site.xml,and
hdfs-site.xml. Apart from these there are some more configuration files in the
configuration directory.
[sukul@server1 ~]$ cd
$HADOOP_HOME/conf
[sukul@server1 conf]$ ls *xml
capacity-scheduler.xml hadoop-policy.xml mapred-site.xml ssl-server.xml
core-site.xml hdfs-site.xml ssl-client.xml yarn-site.xml
[sukul@server1 conf]$ ls *sh
hadoop-env.sh mapred-env.sh
yarn-env.sh
A]
hadoop-env.sh/yarn-env.sh/mapred-env.sh
- The hadoop-env.sh is used to source environment variables for Hadoop daemons and processes.
- This can include daemon JVM settings such as heap size or Java options, as well as basic variables required by many processes such as HADOOP_LOG_DIRorJAVA_HOME (Following shows just few lines of the hadoop-env.sh script)
|
[sukul@server1
conf]$ cat hadoop-env.sh | grep -v
'^#' | sed '/^$/d'
export
JAVA_HOME=/opt/app/java/jdk/jdk180/
export HADOOP_HOME_WARN_SUPPRESS=1
export
HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/2.6.5.4-1/hadoop}
export
JSVC_HOME=/usr/lib/bigtop-utils
export
HADOOP_HEAPSIZE="4096"
export
HADOOP_NAMENODE_INIT_HEAPSIZE="-Xms233472m"
export
HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}"
export
HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:ErrorFile=/opt/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=25600m
-XX:MaxNewSize=25600m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=256m
-Xloggc:/opt/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly
-Xms233472m -Xmx233472m -Dhadoop.security.logger=INFO,DRFAS
-Dhdfs.audit.logger=INFO,RFAAUDIT ${HADOOP_NAMENODE_OPTS}"
|
- Basically, if you need to pass any environment variables to any Hadoop process, thehadoop-env.shfile is the file to do this in, as it is sourced by all Hadoop control scripts.
- Similarly, there may be other environment shell scripts such as yarn-env.shandmapred-env.sh that are used by these specific processes to source necessary environment variables. (Following shows just few lines of the mapred-env.sh and yarn-env.sh scripts)
|
[sukul@server1
conf]$ cat mapred-env.sh | grep -v
'^#' | sed '/^$/d'
export
HADOOP_JOB_HISTORYSERVER_HEAPSIZE=16384
export
HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
export
HADOOP_JOB_HISTORYSERVER_OPTS=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:/opt/log/hadoop-mapreduce/mapred/gc_trace.log
-XX:ErrorFile=/opt/log/hadoop-mapreduce/mapred/java_error.log
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/log/hadoop-mapreduce/mapred/heap_dump.hprof
export
HADOOP_OPTS="-Dhdp.version=$HDP_VERSION $HADOOP_OPTS"
|
|
[sukul@server1
conf]$ cat yarn-env.sh | grep -v '^#'
| sed '/^$/d'
export
HADOOP_YARN_HOME=/usr/hdp/2.6.5.4-1/hadoop-yarn
export
YARN_LOG_DIR=/opt/log/hadoop-yarn/$USER
export
YARN_PID_DIR=/var/run/hadoop-yarn/$USER
export
HADOOP_LIBEXEC_DIR=/usr/hdp/2.6.5.4-1/hadoop/libexec
export
JAVA_HOME=/opt/app/java/jdk/jdk180/
export
HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
export
YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"
if [ "$JAVA_HOME" !=
"" ]; then
#echo "run java in $JAVA_HOME"
JAVA_HOME=$JAVA_HOME
fi
if [ "$JAVA_HOME" =
"" ]; then
echo "Error: JAVA_HOME is not set."
exit 1
fi
|
B] log4j.properties
- Hadoop usesLog4J(the Java logging framework) to store and manage its log files. Log files are produced by nearly every process in Hadoop, including daemons, applications, and tasks.
- Thelog4j.propertiesfile provides configuration for log file management, including how to write log records, where to write them, and how to manage rotation and retention of log files.
- Following shows sample log4j.properties file:
|
[sukul@server1 ~]$ cd $HADOOP_HOME/conf
[sukul@server1 conf]$ ls
log4j.properties
log4j.properties
[sukul@server1 conf]$ cat
log4j.properties | grep -v '^#' | sed '/^$/d'
hadoop.root.logger=INFO,console
hadoop.log.dir=.
hadoop.log.file=hadoop.log
log4j.rootLogger=${hadoop.root.logger},
EventCounter
log4j.threshhold=ALL
log4j.appender.RFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.RFA.DatePattern=.yyyy-MM-dd
log4j.appender.RFA.MaxBackupIndex=45
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601}
%p %c: %m%n
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{2}: %m%n
hadoop.tasklog.taskid=null
hadoop.tasklog.iscleanup=false
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601}
%p %c: %m%n
hadoop.security.logger=INFO,console
hadoop.security.log.maxfilesize=256MB
hadoop.security.log.maxbackupindex=20
log4j.category.SecurityLogger=WARN,console
hadoop.security.log.file=SecurityAuth.audit
log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601}
%p %c: %m%n
log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601}
%p %c: %m%n
log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
hdfs.audit.logger=INFO,console
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601}
%p %c{2}: %m%n
log4j.appender.RFAAUDIT.MaxBackupIndex=180
log4j.appender.RFAAUDIT.MaxFileSize=16106127360
mapred.audit.logger=INFO,console
log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
log4j.appender.MRAUDIT=org.apache.log4j.DailyRollingFileAppender
log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601}
%p %c{2}: %m%n
log4j.appender.MRAUDIT.DatePattern=.yyyy-MM-dd
hadoop.metrics.log.level=INFO
log4j.logger.org.apache.hadoop.metrics2=${hadoop.metrics.log.level}
log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
log4j.logger.BlockStateChange=ERROR
log4j.logger.org.apache.hadoop.hdfs.StateChange=WARN
yarn.log.dir=.
hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
log4j.appender.JSA=org.apache.log4j.DailyRollingFileAppender
yarn.server.resourcemanager.appsummary.log.file=hadoop-mapreduce.jobsummary.log
yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
log4j.appender.RMSUMMARY.File=${yarn.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
log4j.appender.RMSUMMARY.MaxFileSize=256MB
log4j.appender.RMSUMMARY.MaxBackupIndex=20
log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601}
%p %c{2}: %m%n
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd
HH:mm:ss} %p %c{2}: %m%n
log4j.appender.JSA.DatePattern=.yyyy-MM-dd
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
|
- In some cases different components may have their own specificlog4j.propertiesfile, which may be located in the Hadoop configuration directory, such askms-log4j.propertiesandhttpfs-log4j.properties.
C]
hadoop-metrics.properties
- We may also have hadoop-metrics.properties and/or hadoop-metrics2.propertiesfiles in your Hadoop configuration directory. These are used to define application and platform metrics to collect. Following are the sample hadoop-metrics.properties
|
[sukul@server1 conf]$ cat
hadoop-metrics2.properties | grep -v '^#' | sed '/^$/d'
*.period=10
*.sink.timeline.plugin.urls=file:///usr/lib/ambari-metrics-hadoop-sink/ambari-metrics-hadoop-sink.jar
*.sink.timeline.class=org.apache.hadoop.metrics2.sink.timeline.HadoopTimelineMetricsSink
*.sink.timeline.period=10
*.sink.timeline.sendInterval=60000
*.sink.timeline.slave.host.name=serv084.zbc.xyz.com
*.sink.timeline.zookeeper.quorum=serv269.zbc.xyz.com:2181,serv271.zbc.xyz.com:2181,serv267.zbc.xyz.com:2181
*.sink.timeline.protocol=http
*.sink.timeline.port=6188
*.sink.timeline.truststore.path =
/etc/security/clientKeys/all.jks
*.sink.timeline.truststore.type =
jks
*.sink.timeline.truststore.password
= bigdata
datanode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
namenode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
resourcemanager.sink.timeline.collector.hosts=serv287.zbc.xyz.com
nodemanager.sink.timeline.collector.hosts=serv287.zbc.xyz.com
jobhistoryserver.sink.timeline.collector.hosts=serv287.zbc.xyz.com
journalnode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
maptask.sink.timeline.collector.hosts=serv287.zbc.xyz.com
reducetask.sink.timeline.collector.hosts=serv287.zbc.xyz.com
applicationhistoryserver.sink.timeline.collector.hosts=serv287.zbc.xyz.com
resourcemanager.sink.timeline.tagsForPrefix.yarn=Queue
|
D] Other
Configuration Files:
- Slaves files: Used by the cluster startup scripts in the Hadoop sbin directory. This contains list of slave nodes.
- hadoop-policy.xml,kms-site.xml, orssl-server.xml: configuration files related to security or access control policies, SSL configuration, or key management
No comments:
Post a Comment