Member-only story

Apache Hive Hands-on

Avinash Navlani
7 min readNov 22, 2022

In this tutorial, we will focus on Hadoop Hive for processing big data.

What is Hive?

Hive is a component in Hadoop Stack. It is an open-source data warehouse tool that runs on top of Hadoop. It was developed by Facebook and later it is donated to the Apache foundation. It reads, writes, and manages big data tables stored in HDFS or other data sources.

Hive doesn't offer insert, delete and update operations but it is used to perform analytics, mining, and report generation on the large data warehouse. Hive uses Hive query language similar to SQL. Most of the syntax is similar to the MySQL database. It is used for OLAP (Online Analytical Processing) purposes.

Why we need Hive?

In the year 2006, Facebook was generating 10 GB of data per day and in 2007 its data increased by 1 TB per day. After few days, it is generating 15 TB of data per day. Initially, Facebook is using the Scribe server, Oracle database, and Python scripts for processing large data sets. As Facebook started gathering data then they shifted to Hadoop as its key tool for data analysis and processing.

Facebook is using Hadoop for managing its big data and facing problems for ETL operations because for each small operation they need to write the Java programs. They need a lot of Java resources that are difficult to find and Java is not easy to learn. So Facebook developed Hive which uses SQL-like syntaxes that are easy to learn and write. Hive makes it easy for people who know SQL just like other RDBMS tools.

Hive Features

The following are the features of the Hive.

  • It is a Data Warehousing tool.
  • It is used for enterprise data wrangling.
  • It uses the SQL-like language HiveQL or HQL. HQL is a non-procedural and declaration language.
  • It is used for OLAP operations.
  • It increases productivity by reducing 100 lines of Java code into 4 lines of HQL queries.
  • It supports Table, Partition, and Bucket data structures.
  • It is built on top of Hadoop Distributed File System (HDFS)
  • Hive supports Tez, Spark, and MapReduce.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Avinash Navlani
Avinash Navlani

Written by Avinash Navlani

Sr Data Scientist| Analytics Consulting | Data Science Communicator | Helping Clients to Improve Products & Services with Data

No responses yet

Write a response