Hive is a query language, which was developed by Facebook. Hadoop can give support to any kind of data.

  • Structured data like database tables.
  • Unstructured data like videos, audios, PDF, TXT files etc.
  • Semi structured data like XML.

Hadoop supports HiveQL.

Difference between SQL & HiveQL

  • In SQL, we can insert the data values row by row but not in HQL.
  • In SQL we can update any row or column but not in HQL because the data is stored in HDFS, after putting data into HDFS, you shouldn’t change the contents of the data.
  • In SQL, we can use delete but not in HQL.

In Hive, every table is created as a directory.

HQL datatypes

Like other RDBMS (Oracle, MySQL, SQL Server), it also has databases.

TinyIntFloatMap
ShortIntDoubleArray
BigIntStringStruct

Here map, array, struct are called collection datatypes.

Creating hive tables

Hive tables can be created in two ways.

  1. Managed tables or Internal tables.
  2. External tables.

Managed tables or Internal tables

  • user@machine:~$ hive
  • hive> create table employee(id int, name string,salary float)
  • >row format delimited
  • >fields terminated by ‘\t’;


Important points

  • String can contain any kind of data.
  • In SQL, if you want to insert the data, you have to first create schema or table but in HQL, you can either create the table and insert the data or you can insert the data and then create the table.
  • If you apply it after table column in create statement in HQL, it will give you null but not actual data and it will not give you any error, so you need to write delimiter & terminated line.

Loading data into Hive tables

The data can be loaded in two ways.

Either from local file system or from HDFS

Loading data from local file system

hive>load data local inpath <filepath> into table <tablename>

Loading data from HDFS

hive>load data inpath <filepath> into table <tablename>

  • If it is a local file system, the default path is home/user.
  • If it is HDFS, it is a user.

Here, we will hear few words like metadata, which means the data about the data & metastore, which means keeping metadata to store.

External Tables

hive> create external table employeeE(id int, name string,salary float),

  • >row format delimited
  • >fields terminated by ‘\t’
  • >location “/vimal/newfolder”;

Concept

  • If we are creating internal tables, the table name is created as a directory on the warehouse. If we are creating the external tables, the table name will never be created as a directory name but is just trying to refresh some location /vimal/newfolder.

  • For global usage, you can refer to an external table but not an internal table.

Intenal Table

/user / hive / warehouse
employee (directory)
employee (file)
employee1 (file)

External Table

/vimal / newfolder /
employee (file)

Next Recommended Readings