1. Install JDK
1-1. Download and install a JDK (Java Development Kit)
You can download OpenJDK 11 for Windows from Oracle. After the download, unzip the file and put jdk
folder to proper directory for example C:\JDK
. OpenJDK 11 requires your Oracle account.
Also, you must install the JDK into a path with no spaces.
2. Install Apache Spark
2-1. Download Apache Hadoop and Spark
Download pre-built version of Apache Hadoop and Spark 3 from Apache Spark. Since this is .tgz file, download and install WinRAR if necessary.
2-2. Extract and move Spark
After the extraction of the Spark archive, copy its contents into proper directory, for example C:\Spark
. There must be extracted contents directly under the folder.
2-3. Install winutils.exe in C:\Hadoop
Download proper winutils.exe from here and move the bin folder into C:\Hadoop
.
3. Set Environment Variables
You must set new environment variables to use Java and Spark on Windows. Open Environment Variables via Windows menu.
Add below System variables.
JAVA_HOME
: {YOUR-JAVA-DIRECTORY}
ex. C:\JDK
and add %JAVA_HOME%\bin
into Path
variable.
SPARK_HOME
: {YOUR-SPARK-DIRECTORY}
ex. C:\Spark\spark-3.5.0-bin-hadoop3
and add %SPARK_HOME%\bin
into Path
variable.
HADOOP_HOME
: {YOUR-HADOOP-DIRECTORY}
ex. C:\Hadoop
and add %HADOOP_HOME%\bin
into Path
variable.
PYSPARK_PYTHON
: {YOUR-PYTHON-PATH}
ex. C:\anaconda3\python.exe
Note that above directories are example. You have to put the appropriate directory where each content is located.