The term Big Data is used for the collection of a huge amount of data in a distributed environment where data is measured in terms of VVV(Velocity, Variety, and Volume). Since the coming of digitalization volume of data has been increased along with a variety of data such as structured, semi-structured, and unstructured data.
Structured data was easily handled with SQL but semistructured and unstructured data handling with SQL was a difficult task. So comes the introduction of Hadoop by the apache software foundation.
Before understanding, hive let’s first understand Hadoop.
Hadoop by apache software played an important role when traditional RDBMS was not able to handle such a huge amount of growing data. It is an open-source framework that helps us to store and perform computation on data in an HDFS environment. It stores files into blocks, each block gets divided into pieces. Each piece is called a block and each block resides on a different machine.
It has two main modules
(i) Map-Reduce : Help us perform computation over big data.
(ii) HDFS: Help in storing big data.
The Hadoop ecosystem contains various tools :
(i) Pig/Pig Latin: It is a procedural language platform used to develop a script for MapReduce operations.Developed by YAHOO
(ii) Hive: It is a platform used to develop SQL type scripts to do MapReduce operations.Developed by GOOGLE.
(iii) Sqoop:It is used to import and export data to and from between HDFS and RDBMS.
(iv) Yarn(Yet Another Resource Negotiator):It is an cluster manager.Helps in running mutiple clusters.
This article mainly focuses on the Hive and HQL.
Hive is a data warehouse infrastructure tool that helps in-process structured, semi-structured data in Hadoop. Resides on the top of HDFS.
create database hive_learning;
create table emp(id int, name string);
insert into emp values(1,’raj’),(2,’anuj’),(3,’nisha’);
select * from emp;
alter table emp add columns(city string);
alter table emp replace columns(id int, name string);
These are the basic commands in the hive. Hive doesn’t support all the RDMS commands. As hive does not work as OLTP but works as OLAP.
Thanks for the read🙂.Do clap👏 if find it useful.
“Keep learning and keep sharing knowledge”