Wednesday, 24 April 2019

HDFS Rack Awareness


HDFS Rack Awareness
    • What is Rank Awareness? Rack awareness enables HDFS to understand a cluster topology that may include multiple racks of servers or multiple data centers, and to orchestrate its block placement accordingly. This allows us to achieve the goals of Data locality, fault tolerance, and resiliency .
       
    • Advantage of Rank Awareness: Implementing a rack topology and using rack awareness in HDFS, data can be dispersed across racks or data centers to provide further fault tolerance in the event of a rack failure, switch or network failure, or even a data center outage.
       
    • With a rack topology defined, Hadoop will place blocks on nodes according to the following strategy:
      1. The first replica of a block is placed on a node in the cluster. This is the same node as the Hadoop client if the client is running within the cluster.
      2. The second replica of a block is placed on a node residing on a different rack from the first replica.
      3. The third replica, assuming a default replication factor of 3, is placed on a different node on the same rack as the second replica.
    • How is Rank Awareness implemented: Rack awareness is implemented using a user-supplied script, which could be in any scripting language available on the cluster. Common scripting languages used include Python, Ruby, or BASH. The script, called a rack topology script, provides Hadoop with an identifier for any given node, telling Hadoop to which rack that particular node belongs. A rack could be a physical rack (e.g., a 19” server rack) or a particular subnet, or even an abstraction representing a data center.
    • The script needs to return a hierarchical location ID for a host passed in as a script argument. The hierarchical location ID is in the form /datacenter/rack. This can be pseudo-coded as follows: topology_script([datanode_hosts]) -> [rackids]
    • What happens when there is no topology script: Rack awareness is still implemented if a rack topology script is not supplied. In that case, all nodes have a default location ID of /default-rack.
    • The script can be implemented in several different ways with no strict definition of how to accomplish this. For instance, you could embed the location ID in the hostname itself and use simple string manipulation to determine the location, as in the following example: dc01rack01node02. You could also create a lookup table or map and store this in a file or in a database.
    • Where is the topology script located: Rack awareness is implemented on the HDFS client or client application, so the script, its chosen interpreter, and any supporting data such as lookup files need to be available on the client.
    • A sample rack topology script is provided below
    #!/usr/bin/env python
    # input: hostname or ipaddress of a node or nodes
    # output: rack id for each node
    # example:
    # input of dc01rack01node02
    # outputs /dc01/rack01
    import sys
    DEFAULT_RACK = "/dc01/default-rack"
    for host in sys.argv[1:]:
      if len(host) == 16:
       dcid = host[:4]
       rackid = host[4:10]
       print "/" + dcid + "/" + rackid
      else:
       print DEFAULT_RACK
    • This script is enabled using the net.topology.script.file.name configuration property in the core-site.xml configuration
    <property>
      <name>net.topology.script.file.name</name>
      <value>/etc/hadoop/conf/rack-topology-script.py</value>
    </property>

Configuration files for hadoop eco-system components


Configuration files for hadoop eco-system components
    • Hive:
      • Hive will usually inherit its HDFS and YARN configuration from the Hadoop configuration files just spoken about (core-site.xml, hdfs-site.xml, yarn-site.xml, and mapred-site.xml). However, many other properties specific to Hive are maintained in the hive-site.xml configuration file, typically located in /etc/hive/conf.
      • The most significant settings in hive-site.xml are the connection details to a shared Hive metastore

    • Pig:
      • As with Hive, most HDFS- and YARN-specific configuration details are sourced from the Hadoop configuration on the host, typically a client including the Pig and Hadoop client libraries. Additional Pig-specific properties are located in the pig.properties file, which is in /etc/pig/conf on most distributions. 
      • Note that the pig.properties file, unlike many of the other Hadoop configuration files, is not an XML document.

    • Spark:
      • Spark configuration properties are set through the spark-defaults.conf file located in $SPARK_HOME/conf. This configuration file is read by Spark applications and daemons upon startup.
      • The spark-defaults.conf file, like pig.properties, is also not in the standard Hadoop XML configuration format.Spark configuration properties can also be set programmatically in your driver code using the SparkConf object

    • Hbase:
      • HBase configuration is typically stored in /etc/hbase/conf, the primary configuration file being the hbase-site.xml file. This will govern the behavior of HBase and will be used by the HMaster and RegionServers alike.


core-site.xml and hdfs-site.xml Imp properties


HDFS Configuration Parameters(Detailed):
Common Properties (core-site.xml)
    • Used to store many properties that are required by different processes in a cluster, including client processes, master node processes, and slave node processes.

    fs.defaultFS

    • fs.defaultFS is the property that specifies the filesystem to be used by the cluster. Most of the time this is HDFS. However, the filesystem used by the cluster could be an object store, such as S3 or Ceph, or a local or network filesystem. Note that the value specifies the filesystem scheme as well as hostname and port information for the filesystem.

    • When using filesystems such as S3, you will need to provide additional AWS authentication parameters, including fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey, to supply your credentials to the remote platform. This may also be the case with other remote filesystems or object stores.

    • When high-availability and federation is used fs.defaultFS will use a construct called nameservices instead of specifying hosts explicitly.

    Example:
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://mynamenode:8020</value>
    </property>



hdfs-site.xml
    dfs.namenode.name.dir and dfs.namenode.edits.dir
     
    • The dfs.namenode.name.dir  property specifies the location on the filesystem where the NameNode stores its on-disk metadata, specifically the fsimage file(s).This value is also used as the default directory for the edits files used for the NameNodes journalling function. However, this location can be set to a different directory using the dfs.namenode.edits.dir property.

    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:///disk1/dfs/nn,file:///disk2/dfs/nn</value>
    </property>
    Note that there are no spaces between the comma-delimited values. Example value is /opt/app/daya01/hdfs/nn

    • A loss of the NameNode’s metadata would result in the loss of all of the data stored in the cluster’s distributed filesystem—potentially petabytes of data. The on-disk metadata structures are there to provide durability and crash consistency for the NameNode’s metadata, which is otherwise stored in volatile memory on the NameNode host.
       
    • The value for the dfs.namenode.name.dir property is a comma-separated list of directories (on a local or network file system—not on HDFS) where the fsimage files and edits files by default will be stored by default.


    • The NameNodes metadata is written to each directory in the comma-separated list of directories specified by dfs.namenode.name.dir synchronously. If any directory or volume specified in the list is unavailable, it will be removed from the cached list of directories and no further attempts will be made to write to this directory until the NameNode is restarted.

    • The parallel write operations to multiple directories provide additional fault tolerance to the NameNode’s critical metadata functions. For this reason, you should always provide more than one directory, residing on a different physical disk, volume, or disk controller to minimize the risk of outages if one volume or channel fails.

    • In some cases such as non-HA deployments should, specify an NFS mount point in the list of directories in dfs.namenode.name.dir, which will then store a copy of the metadata on another host, providing further fault tolerance and recovery options. Note that if you do this, you should soft mount and configure retries for the NFS mount point; otherwise the NameNode process may hang if the mount is temporarily unavailable for whatever reason.

    • The dfs.namenode.name.dir and dfs.namenode.edits.dir properties are read by the NameNode daemon upon startup. Any changes to these properties will require a NameNode restart to take effect.



    dfs.namenode.checkpoint.dir/period/txns
     
    There are three significant configuration properties that relate to the checkpointing function.
     
    • The dfs.namenode.checkpoint.dir property is a comma-separated list of directories, analogous to the dfs.namenode.name.dir property discussed earlier, used to store the temporary edits to merge during the checkpointing process.

    • The dfs.namenode.checkpoint.period property specifies the maximum delay between two consecutive checkpoints with a default value of one hour.

    • The dfs.namenode.checkpoint.txns property specifies the number of transactions at which the NameNode will force a checkpoint. Checkpointing will occur when either threshold is met (time interval or transactions).


    dfs.datanode.data.dir

    • The dfs.datanode.data.dir is the property that specifies where the DataNode will store the physical HDFS blocks.  Like the dfs.namenode.name.dir property this too is a comma-separated list of directories with no spaces between.

    • Unlike the dfs.namenode.name.dir setting, writes to the directories specified in the dfs.datanode.data.dir property are performed in a round-robin fashion (i.e., the first block on one directory, the next block on the next, and so on).

    • The configuration for each DataNode may differ . On different slave nodes, the volumes and directories may differ. However, when planning your cluster, it is best to try to homogenize as many configuration settings as possible, making the cluster easier to manage.

    dfs.datanode.du.reserved

    • HDFS is a “greedy” file system. Volumes associated with directories on the DataNodes specified by dfs.datanode.data.dir will be 100% filled with HDFS block data if left unmanaged. This is problematic because slave nodes require working space for intermediate data storage. If available disk storage is completely consumed by HDFS block storage, data locality suffers as processing activities may not be possible on the node.

    • The dfs.datanode.du.reserved configuration property in hdfs-site.xml specifies the amount of space in bytes on each volume that must be reserved, and thus cannot be used for HDFS block storage. It is generally recommended to set this value to 25% of the available space or at least 1 GB, depending upon the local storage on each DataNode
       
dfs.blocksize

  • The property dfs.blocksize specifies the block size in bytes for new files written by clients, including files produced as the result of an application or job run by the client. The default is 134217728, or 128MB.

  • Although commonly thought of as a cluster- or server-based setting, dfs.blocksize is actually a client setting.
     
  • This property can be influenced by administrators using the <final> tag on a server node as discussed earlier
     
dfs.replication
 
  • The dfs.replication property located in the hdfs-site.xml file determines the number of block replicas created when a file is written to HDFS.
     
  • The default is 3, which is also the recommended value.
     
  • As with the dfs.blocksize property, the dfs.replication property is a client-side setting



Examples of above Properties:
<property>
      <name>dfs.namenode.name.dir</name>
      <value>/opt/app/data01/hdfs/nn</value>
    </property>

   <property>
      <name>dfs.namenode.checkpoint.dir</name>
      <value>/opt/app/data01/hdfs/snn</value>
    </property>

    <property>
      <name>dfs.namenode.checkpoint.edits.dir</name>
      <value>${dfs.namenode.checkpoint.dir}</value>
    </property>

    <property>
      <name>dfs.namenode.checkpoint.period</name>
      <value>21600</value>
    </property>

    <property>
      <name>dfs.namenode.checkpoint.txns</name>
      <value>10000000</value>
    </property>

   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/opt/data/data01/hdfs/dn,/opt/data/data02/hdfs/dn,/opt/data/data03/hdfs/dn,/opt/data/data04/hdfs/dn,/opt/data/data05/hdfs/dn,/opt/data/data06/hdfs/dn,/opt/data/data07/hdfs/dn,/opt/data/data08/hdfs/dn,/opt/data/data09/hdfs/dn,/opt/data/data10/hdfs/dn,/opt/data/data11/hdfs/dn,/opt/data/data12/hdfs/dn</value>
   </property>

<property>
      <name>dfs.datanode.du.reserved</name>
      <value>1073741824</value>
    </property>

    <property>
      <name>dfs.blocksize</name>
      <value>134217728</value>
    </property>

    <property>
      <name>dfs.replication</name>
      <value>3</value>
    </property>

    <property>
      <name>dfs.replication.max</name>
      <value>50</value>
    </property>

Other Hadoop Environment Scripts and Configuration Files


Other Hadoop Environment Scripts and Configuration Files
We are familiar with many configuration files like core-site.xml,yarn-site.xml,mapred-site.xml,and hdfs-site.xml. Apart from these there are some more configuration files in the configuration directory.
[sukul@server1 ~]$ cd $HADOOP_HOME/conf
[sukul@server1 conf]$ ls *xml
capacity-scheduler.xml  hadoop-policy.xml  mapred-site.xml  ssl-server.xml
core-site.xml           hdfs-site.xml      ssl-client.xml   yarn-site.xml
[sukul@server1 conf]$ ls *sh
hadoop-env.sh  mapred-env.sh  yarn-env.sh

A] hadoop-env.sh/yarn-env.sh/mapred-env.sh
  • The hadoop-env.sh is used to source environment variables for Hadoop daemons and processes.
  • This can include daemon JVM settings such as heap size or Java options, as well as basic variables required by many processes such as HADOOP_LOG_DIRorJAVA_HOME (Following shows just few lines of the hadoop-env.sh script)
[sukul@server1 conf]$ cat hadoop-env.sh  | grep -v '^#' | sed '/^$/d'
export JAVA_HOME=/opt/app/java/jdk/jdk180/
export HADOOP_HOME_WARN_SUPPRESS=1
export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/2.6.5.4-1/hadoop}
export JSVC_HOME=/usr/lib/bigtop-utils
export HADOOP_HEAPSIZE="4096"
export HADOOP_NAMENODE_INIT_HEAPSIZE="-Xms233472m"
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}"
export HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:ErrorFile=/opt/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=25600m -XX:MaxNewSize=25600m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=256m -Xloggc:/opt/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms233472m -Xmx233472m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,RFAAUDIT ${HADOOP_NAMENODE_OPTS}"

  • Basically, if you need to pass any environment variables to any Hadoop process, thehadoop-env.shfile is the file to do this in, as it is sourced by all Hadoop control scripts. 
  • Similarly, there may be other environment shell scripts such as yarn-env.shandmapred-env.sh that are used by these specific processes to source necessary environment variables. (Following shows just few lines of the mapred-env.sh and yarn-env.sh scripts)

[sukul@server1 conf]$ cat mapred-env.sh  | grep -v '^#' | sed '/^$/d'
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=16384
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
export HADOOP_JOB_HISTORYSERVER_OPTS=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/opt/log/hadoop-mapreduce/mapred/gc_trace.log -XX:ErrorFile=/opt/log/hadoop-mapreduce/mapred/java_error.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/log/hadoop-mapreduce/mapred/heap_dump.hprof
export HADOOP_OPTS="-Dhdp.version=$HDP_VERSION $HADOOP_OPTS"
[sukul@server1 conf]$ cat yarn-env.sh  | grep -v '^#' | sed '/^$/d'
export HADOOP_YARN_HOME=/usr/hdp/2.6.5.4-1/hadoop-yarn
export YARN_LOG_DIR=/opt/log/hadoop-yarn/$USER
export YARN_PID_DIR=/var/run/hadoop-yarn/$USER
export HADOOP_LIBEXEC_DIR=/usr/hdp/2.6.5.4-1/hadoop/libexec
export JAVA_HOME=/opt/app/java/jdk/jdk180/
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"
if [ "$JAVA_HOME" != "" ]; then
  #echo "run java in $JAVA_HOME"
  JAVA_HOME=$JAVA_HOME
fi
if [ "$JAVA_HOME" = "" ]; then
  echo "Error: JAVA_HOME is not set."
  exit 1
fi

B] log4j.properties
  • Hadoop usesLog4J(the Java logging framework) to store and manage its log files. Log files are produced by nearly every process in Hadoop, including daemons, applications, and tasks. 
  • Thelog4j.propertiesfile provides configuration for log file management, including how to write log records, where to write them, and how to manage rotation and retention of log files.
  • Following shows sample log4j.properties file:
[sukul@server1 ~]$ cd $HADOOP_HOME/conf
[sukul@server1 conf]$ ls log4j.properties
log4j.properties
[sukul@server1 conf]$ cat log4j.properties | grep -v '^#' | sed '/^$/d'
hadoop.root.logger=INFO,console
hadoop.log.dir=.
hadoop.log.file=hadoop.log
log4j.rootLogger=${hadoop.root.logger}, EventCounter
log4j.threshhold=ALL
log4j.appender.RFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.RFA.DatePattern=.yyyy-MM-dd
log4j.appender.RFA.MaxBackupIndex=45
log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
hadoop.tasklog.taskid=null
hadoop.tasklog.iscleanup=false
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
hadoop.security.logger=INFO,console
hadoop.security.log.maxfilesize=256MB
hadoop.security.log.maxbackupindex=20
log4j.category.SecurityLogger=WARN,console
hadoop.security.log.file=SecurityAuth.audit
log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
hdfs.audit.logger=INFO,console
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.RFAAUDIT.MaxBackupIndex=180
log4j.appender.RFAAUDIT.MaxFileSize=16106127360
mapred.audit.logger=INFO,console
log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
log4j.appender.MRAUDIT=org.apache.log4j.DailyRollingFileAppender
log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.MRAUDIT.DatePattern=.yyyy-MM-dd
hadoop.metrics.log.level=INFO
log4j.logger.org.apache.hadoop.metrics2=${hadoop.metrics.log.level}
log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
log4j.logger.BlockStateChange=ERROR
log4j.logger.org.apache.hadoop.hdfs.StateChange=WARN
yarn.log.dir=.
hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
log4j.appender.JSA=org.apache.log4j.DailyRollingFileAppender
yarn.server.resourcemanager.appsummary.log.file=hadoop-mapreduce.jobsummary.log
yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
log4j.appender.RMSUMMARY.File=${yarn.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
log4j.appender.RMSUMMARY.MaxFileSize=256MB
log4j.appender.RMSUMMARY.MaxBackupIndex=20
log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
log4j.appender.JSA.DatePattern=.yyyy-MM-dd
log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false

  • In some cases different components may have their own specificlog4j.propertiesfile, which may be located in the Hadoop configuration directory, such askms-log4j.propertiesandhttpfs-log4j.properties. 

C] hadoop-metrics.properties
  • We may also have hadoop-metrics.properties and/or hadoop-metrics2.propertiesfiles in your Hadoop configuration directory. These are used to define application and platform metrics to collect. Following are the sample hadoop-metrics.properties

[sukul@server1 conf]$ cat hadoop-metrics2.properties | grep -v '^#' | sed '/^$/d'
*.period=10
*.sink.timeline.plugin.urls=file:///usr/lib/ambari-metrics-hadoop-sink/ambari-metrics-hadoop-sink.jar
*.sink.timeline.class=org.apache.hadoop.metrics2.sink.timeline.HadoopTimelineMetricsSink
*.sink.timeline.period=10
*.sink.timeline.sendInterval=60000
*.sink.timeline.slave.host.name=serv084.zbc.xyz.com
*.sink.timeline.zookeeper.quorum=serv269.zbc.xyz.com:2181,serv271.zbc.xyz.com:2181,serv267.zbc.xyz.com:2181
*.sink.timeline.protocol=http
*.sink.timeline.port=6188
*.sink.timeline.truststore.path = /etc/security/clientKeys/all.jks
*.sink.timeline.truststore.type = jks
*.sink.timeline.truststore.password = bigdata
datanode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
namenode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
resourcemanager.sink.timeline.collector.hosts=serv287.zbc.xyz.com
nodemanager.sink.timeline.collector.hosts=serv287.zbc.xyz.com
jobhistoryserver.sink.timeline.collector.hosts=serv287.zbc.xyz.com
journalnode.sink.timeline.collector.hosts=serv287.zbc.xyz.com
maptask.sink.timeline.collector.hosts=serv287.zbc.xyz.com
reducetask.sink.timeline.collector.hosts=serv287.zbc.xyz.com
applicationhistoryserver.sink.timeline.collector.hosts=serv287.zbc.xyz.com
resourcemanager.sink.timeline.tagsForPrefix.yarn=Queue

D] Other Configuration Files:   
  • Slaves files: Used by the cluster startup scripts in the Hadoop sbin directory. This contains list of slave nodes.

  • hadoop-policy.xml,kms-site.xml, orssl-server.xml: configuration files related to security or access control policies, SSL configuration, or key management