How To Install Apache Spark On Windows 10


Step1:
Apache Spark requires Java 8, so make sure you have it before you start Step2.

C:\>java -version
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 1.8.0_231-b32)
Java HotSpot(TM) Client VM (build 25.231-b32, mixed mode)

Step2:
Download Spark from here. I have downloaded ‘spark-2.4.5-bin-hadoop2.7.tgz’.

Now unzip and untar the file. I have extracted this at below folder:

C:\install\spark-2.4.5-bin-hadoop2.7>dir
 Volume in drive C is Windows
 Volume Serial Number is DAC9-F90D

 Directory of C:\install\spark-2.4.5-bin-hadoop2.7

02/03/2020  03:47 AM    <DIR>          .
02/03/2020  03:47 AM    <DIR>          ..
02/03/2020  03:47 AM    <DIR>          bin
02/03/2020  03:47 AM    <DIR>          conf
02/03/2020  03:47 AM    <DIR>          data
02/03/2020  03:47 AM    <DIR>          examples
02/03/2020  03:47 AM    <DIR>          jars
02/03/2020  03:47 AM    <DIR>          kubernetes
02/03/2020  03:47 AM            21,371 LICENSE
02/03/2020  03:47 AM    <DIR>          licenses
02/03/2020  03:47 AM            42,919 NOTICE
02/03/2020  03:47 AM    <DIR>          python
02/03/2020  03:47 AM    <DIR>          R
02/03/2020  03:47 AM             3,756 README.md
02/03/2020  03:47 AM               187 RELEASE
02/03/2020  03:47 AM    <DIR>          sbin
02/03/2020  03:47 AM    <DIR>          yarn
               4 File(s)         68,233 bytes
              13 Dir(s)  209,520,427,008 bytes free

Step3:
Download Hadoop 2.7's winutils.exe  and place it in a directory C:\install\hadoop

C:\install\hadoop\bin>dir
 Volume in drive C is Windows
 Volume Serial Number is DAC9-F90D

 Directory of C:\install\hadoop\bin

06/05/2020  02:05 AM    <DIR>          .
06/05/2020  02:05 AM    <DIR>          ..
06/05/2020  02:03 AM           109,568 winutils.exe
               1 File(s)        109,568 bytes
               2 Dir(s)  209,519,935,488 bytes free

Step4: Set 2 Environment Variable
HADOOP_HOME = C:\install\hadoop
SPARK_HOME = C:\install\spark-2.4.5-bin-hadoop2.7

Now go to spark bin folder and type spark-shell

C:\install\spark-2.4.5-bin-hadoop2.7\bin>spark-shell
20/06/05 02:23:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://WPJX8DTM5M2.asc.ppg.com:4040
Spark context available as 'sc' (master = local[*], app id = local-1591295001382).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.5
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) Client VM, Java 1.8.0_231)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Now all is set, you can note Spark context available as 'sc' (master = local[*]).

Follow article here to get sample Apache Spark/Scala code example for learning.

Post a Comment

Thanks for your comment !
I will review your this and will respond you as soon as possible.