Hive

The term Big Data is used for the collection of a huge amount of data in a distributed environment where data is measured in terms of VVV(Velocity, Variety, and Volume). Since the coming of digitalization volume of data has been increased along with a variety of data such as structured, semi-structured, and unstructured data.

Structured data was easily handled with SQL but semistructured and unstructured data handling with SQL was a difficult task. So comes the introduction of Hadoop by the apache software foundation.

Before understanding, hive let’s first understand Hadoop.

Hadoop

Hadoop by apache software played an important role when traditional RDBMS was not able to handle such a huge amount of growing data. It is an open-source framework that helps us to store and perform computation on data in an HDFS environment. It stores files into blocks, each block gets divided into pieces. Each piece is called a block and each block resides on a different machine.

It has two main modules

(i) Map-Reduce : Help us perform computation over big data.

(ii) HDFS: Help in storing big data.

The Hadoop ecosystem contains various tools :

(i) Pig/Pig Latin: It is a procedural language platform used to develop a script for MapReduce operations.Developed by YAHOO

(ii) Hive: It is a platform used to develop SQL type scripts to do MapReduce operations.Developed by GOOGLE.

(iii) Sqoop:It is used to import and export data to and from between HDFS and RDBMS.

(iv) Yarn(Yet Another Resource Negotiator):It is an cluster manager.Helps in running mutiple clusters.

This article mainly focuses on the Hive and HQL.

Hive

Hive is a data warehouse infrastructure tool that helps in-process structured, semi-structured data in Hadoop. Resides on the top of HDFS.

Hive Commands(HQL)

create database hive_learning;
use hive_learning;
show databases;
show tables;
describe emp;
create table emp(id int, name string);
insert into emp values(1,’raj’),(2,’anuj’),(3,’nisha’);
select * from emp;
alter table emp add columns(city string);
alter table emp replace columns(id int, name string);

These are the basic commands in the hive. Hive doesn’t support all the RDMS commands. As hive does not work as OLTP but works as OLAP.

Thanks for the read🙂.Do clap👏 if find it useful.

“Keep learning and keep sharing knowledge”

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nidhi Gupta

Nidhi Gupta

490 Followers

Database Engineer 👨‍💻.Heading towards cloud technologies✌️.