Introduction to Apache Spark

5 min readFeb 11, 2023

In this tutorial, we will focus on Spark, Spark Framework, its Architecture, working, Resilient Distributed Datasets, RDD operations, Spark programming language, and a comparison of Spark with MapReduce.

Spark is a fast cluster computing system that is compatible with Hadoop. It has the capability to work with any Hadoop-supported storage system such as HDFS, or S3. Spark uses in-memory computing to improve efficiency. In-memory computation does not save the intermediate output results to disk. Spark also uses caching to handle repetitive queries. Spark is up to 100x times…

Introduction to Apache Spark

Written by Avinash Navlani