Introduction to Apache Spark
In this tutorial, we will focus on Spark, Spark Framework, its Architecture, working, Resilient Distributed Datasets, RDD operations, Spark programming language, and a comparison of Spark with MapReduce.
Spark is a fast cluster computing system that is compatible with Hadoop. It has the capability to work with any Hadoop-supported storage system such as HDFS, or S3. Spark uses in-memory computing to improve efficiency. In-memory computation does not save the intermediate output results to disk. Spark also uses caching to handle repetitive queries. Spark is up to 100x times…