Starting Hadoop datanode: Error: JAVA_HOME is not set and could not be found.- Issues with CDH |My Technical Blog

Intro

CDH (Cloudera's Distribution including apache Hadoop) is the most popular and the best documented distribution of Apache Hadoop. I have recently found out some deficiencies in its documentation when following the CDH4 Quick Start Guide instructions. I installed the Oracle Java Development Kit and set up JAVA_HOME environmental variable according to the instructions, but when attempting to start HDFS nodes I was receiving an error message stating that JAVA_HOME is not set and could not be found. After a quick research I have finally found out that a solution for that is just to export JAVA_HOME inside hadoop-env.sh configuration file in addition to .bash_profile file. The above solution comes very quickly for an experienced Hadoop administrator, but can be tricky for a beginner, so should be well documented by Cloudera in my opinion. The following covers detailed troubleshooting steps both with a solution.

Symptoms

1) You have the Oracle Java Development Kit installed and JAVA_HOME environmental variable exported according to the following HowTo:

[root@hadoop-standalone-mr1 ~]# env | grep JAVA_HOME

JAVA_HOME=/opt/jdk1.6.0_45

2) When attempting to start HDFS nodes you are receiving the following error messages:

[root@hadoop-standalone-mr1 ~]# for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do service $x start ; done

Starting Hadoop datanode: [ OK ]

Error: JAVA_HOME is not set and could not be found.

Starting Hadoop namenode: [ OK ]

Error: JAVA_HOME is not set and could not be found.

Starting Hadoop secondarynamenode: [ OK ]

Error: JAVA_HOME is not set and could not be found.

How to fix the issue

1) Export JAVA_HOME environmental variable in hadoop-env.sh configuration file:

echo export `env | grep ^JAVA_HOME` >> /etc/alternatives/hadoop-conf/hadoop-env.sh

2) You should be fine. All HDFS nodes start up properly now:

[root@hadoop-standalone-mr1 ~]# for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do service $x start ; done

Starting Hadoop datanode: [ OK ]

starting datanode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-datanode-hadoop-standalone-mr1.out

Starting Hadoop namenode: [ OK ]

starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-hadoop-standalone-mr1.out

Starting Hadoop secondarynamenode: [ OK ]

starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-hadoop-standalone-mr1.out

Disclaimer

The above has been tested on CDH4 package, on CentOS 6.4 x86_64 system, in Google Compute Engine environment.
The above solution works both for MRv1 and YARN.