Hive Tutorial – Hive Architecture and NASA Case Study

Apache Hive Tutorial: What is Hive?

Apache Hive is a data warehouse system mounted on top of Hadoop and used to examine structured semi-structured data. Hive abstracts the complexity of Hadoop MapReduce. It gives a mechanism to project structure onto the data and make queries written in HQL (Hive Query Language) related to SQL statements. Inside, these queries or HQL gets changed to map-reduce jobs on the Hive compiler. Hence, you don’t want to bother addressing complex MapReduce programs to prepare your data utilising Hadoop. It is targeted towards users that are satisfied with SQL. Apache Hive helps Data Manipulation Language (DML), Data Definition Language (DDL), and User Defined Functions (UDF).

Challenges at Facebook: Exponential Growth of Data

Before 2008, the company built all the data processing infrastructure on Facebook around a data warehouse based on commercial RDBMS. These foundations were able enough to satisfy the needs of Facebook at that time. As the data began growing very fast, it developed a considerable challenge to handle and process this vast dataset. According to a Facebook study, the data estimated from 15 TB data set in 2007 to a 2 PB data in 2009. Also, various Facebook products include analysing data like Audience Insights, Facebook Lexicon, Facebook Ads, etc. They wanted a scalable and efficient solution to cope with this dilemma and commenced using the Hadoop framework.

Democratizing Hadoop – MapReduce

But, as the data accumulated, the complexity of Map-Reduce codes grew proportionally. So, supporting somebody with a non-programming background to address MapReduce programs grew complex. Also, for doing easy analysis, one has to compose a hundred lines of MapReduce code. As, engineers and analysts, including Facebook widely used SQL, putting SQL on the top of Hadoop, seemed a rational way to make Hadoop available to users with SQL background.

Hence, the strength of SQL to answer for most of the analytic demands and the scalability of Hadoop provided birth to Apache Hive that enables executing SQL like queries on the data existing in HDFS. Later, the Hive project was open-sourced in August’ 2008 by Facebook and is easily obtainable as Apache Hive today.

Now, let us look at the characteristics or benefits of a Hive that makes it so famous.

Advantages of Hive

Helpful for people who aren’t from a programming experience reduces the need to write complex MapReduce program.

Extensible and scalable to cope with the increasing amount and diversity of data, without altering the system’s performance.

It is as an effective ETL (Extract, Transform, Load) tool.
Hive raises any client application written in Java, PHP, Python, C++ or Ruby by showing its Thrift server

Hive’s metadata data is collected in an RDBMS, and it significantly decreases the time to make semantic checks while querying execution.

NASA Case Study

A weather model is a mathematical description of climate systems based on several factors that affect Earth’s climate. It explains the interaction of many drivers of environment like the ocean, sun, atmosphere, etc., to give penetration into the climate system’s dynamics. It is used to project weather conditions by mimicking the climate changes based on circumstances that influence climate. NASA’s Jet Propulsion Laboratory has produced the Regional Climate Model Evaluation System (RCMES) to investigate and estimate the climate output model against remote sensing data existing in various outside repositories.

The RCMES (Regional Climate Model Evaluation System) has two parts:

RCMED (Regional Climate Model Evaluation Database):

It is a scalable cloud database that stores the remote sensing data and reanalysis data associated with the environment using extractors like Apache OODT extractors, Apache Tika, etc. Ultimately, it changes the data as the data point model, which is of the form (latitude, longitude, time, value, height) and reserves it into My SQL database. The customer can recover the data present in RCMED by making Space/Time inquiries. The classification of such questions is not relevant to us now.

RCMET (Regional Climate Model Evaluation Toolkit):

It enables the user to match the reference data present in the RCMED with the weather model output data fetched from other sources to produce different reviews and evaluations. You can relate to the image given beneath to understand the structure of RCMES.

Hive Tutorial – Hive Architecture and NASA Case Study

Deployment of Hive

The below image presents the deployment of the apache hive in RCMES. Following actions were practised by the NASA team while using Apache Hive:

They established the Hive using Cloudera and Apache Hadoop, as shown in the below image.
They practised Apache Sqoop to transfer data into the Hive from the MySQL database.

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article

TOSCA XSCAN Guide: Add New Controls and Resolve Duplicates

July 11, 2025

DevSecOps Fundamentals for Developers & Engineers

July 11, 2025

How to Install and Set Up Power BI?

July 11, 2025

What Is Business Analytics and Reporting?

July 11, 2025

Mastering Python Core Concepts with Powerful MySQL Integration

July 10, 2025

Shift Left DevSecOps: What It Means and Why It Matters

July 10, 2025

Need a Free Demo Class?

Join H2K Infosys IT Online Training

Enroll Now

How to Become a Big Data Engineer?

August 13, 2024

Best Hadoop Certifications: Boost Your Data Skills

August 2, 2024

Cracking The Data Engineer Interview

August 1, 2024

Ecosystem & Components of Hadoop

July 3, 2024

Big Data Career Opportunities in 2024

June 20, 2024

Who is a Hadoop Developer?

May 24, 2024

Who is a Big Data Analyst

May 16, 2024

Top Big Data Companies in 2024

April 16, 2024

Why Learn Big Data in 2024?

April 8, 2024

Is Big Data a Database

April 4, 2024

Steven Roger

Steven Roger is a technology blogger for the H2K Infosys blog, where he brings complex tech concepts to life with clear, engaging insights. With a passion for IT education and over a decade of industry experience, Steven specializes in demystifying the latest in software development, business analysis, and quality assurance training. His articles provide readers with practical knowledge and tips on upskilling for successful careers in tech.

Read All from Steven Roger