UNION vs UNION DISTINCT vs UNION ALL
--
INTRODUCTION
A good way to think about JOINs is extending our dataset horizontally. A UNION, then, is a way of combining data vertically. Unions combine data sets with the same structure; they simply create a table with rows from both tables.
UNION operators can combine the result-set of two or more SELECT statements as long as: Each SELECT statement within UNION must have the same number of columns.
The columns must also have similar data types. The columns in each SELECT statement must also be in the same order.
JOIN vs UNION
JOIN
JOIN combines data from many tables based on a matched condition between them.
Number of columns selected from each table may not be the same.
It combines data into new columns.
Datatypes of corresponding columns selected from each table can be different.
It may not return distinct columns.
UNION
SQL combines the result-set of two or more SELECT statements.
It combines data into new rows
Number of columns selected from each table should be the same.
Datatypes of corresponding columns selected from each table should be the same.
It returns distinct rows.
UNION
UNION operator is used for combining result sets from more than one SELECT statement into one result set.
Any duplicate rows from the results of the SELECT statements are eliminated.
The UNION operator works under two conditions:
1) SELECT queries MUST return a similar number of queries.
2) Data types of all corresponding columns must be compatible.
The UNION operator is normally used to combine data from related tables that have not been normalized perfectly.
Example:-
Syntax
SELECT expression_1, expression_2, … expression_n
FROM tables
[WHERE condition(s)]
UNION
SELECT expression_1, expression_2, … expression_n
FROM tables
[WHERE condition(s)];
UNION DISTINCT
UNION DISTINCT operator is used for combining DISTINCT result sets from more than one SELECT statement into one result set.
Any duplicate rows from the results of the SELECT statements are eliminated so, here all the entries in the result set are distinct.
The UNION DISTINCT operator works under two conditions:
1) SELECT queries MUST return a similar number of queries.
2) Data types of all corresponding columns must be compatible.
The UNION DISTINCT operator is normally used to combine data from related tables that have not been normalized perfectly.
Example:-
Syntax
SELECT expression_1, expression_2, … expression_n
FROM tables
[WHERE condition(s)]
UNION DISTINCT
SELECT expression_1, expression_2, … expression_n
FROM tables
[WHERE condition(s)];
UNION ALL
UNION ALL operator is used for combining result sets from more than one SELECT statement into one result set.
Any duplicate rows from the results of the SELECT statements are not eliminated.
The UNION ALL operator works under two conditions:
1) SELECT queries MUST return a similar number of queries.
2) Data types of all corresponding columns must be compatible.
The UNION ALL operator is normally used to combine data from related tables that have not been normalized perfectly.
Example:-
Syntax
SELECT expression_1, expression_2, … expression_n
FROM tables
[WHERE condition(s)]
UNION ALL
SELECT expression_1, expression_2, … expression_n
FROM tables
[WHERE condition(s)];
PERFORMANCE BASED COMPARISON
An interesting distinction is the presence of UNION versus UNION ALL. Of the two, UNION is the more “intelligent” operation:
if identical rows exist in both SELECT queries, a UNION will know to only give us one row to avoid duplicates. On the other hand,
UNION ALL does return duplicates: this results in a faster query and could be useful for those who want to know what is in both SELECT statements.
UNION DISTINCT is used in scenarios when we need unique based on the where conditions in the query.
IMPLEMENTATION
Let’s consider a scenario for using union, union distinct and union all.We have a table called airports as added the table structure:
CREATE TABLE public.airports
(
icao_code character(4) COLLATE pg_catalog.”default”,
iata_code character(3) COLLATE pg_catalog.”default”,
name character varying(50) COLLATE pg_catalog.”default”,
city character varying(50) COLLATE pg_catalog.”default”,
country character varying(50) COLLATE pg_catalog.”default”,
lat_deg integer,
lat_min integer,
lat_sec integer,
lat_dir character(1) COLLATE pg_catalog.”default”,
lon_deg integer,
lon_min integer,
lon_sec integer,
lon_dir character(1) COLLATE pg_catalog.”default”,
altitude integer,
lat_decimal double precision,
lon_decimal double precision,
id integer NOT NULL DEFAULT nextval(‘airports_id_seq’::regclass),
air_json jsonb,
CONSTRAINT airports_pkey PRIMARY KEY (id)
)
UNION
select name ,city,country,air_json from airports where name=’GOROKA’
union
select name ,city,country,air_json from airports where name=’N/A’ and city=’N/A’
union
select name ,city,country,air_json from airports where country=’GREENLAND’ order by name desc;
UNION ALL
select name ,city,country,air_json from airports where name=’GOROKA’
union all
select name ,city, country, air_json from airports where name=’N/A’ and city=’N/A’
union all
select name ,city, country, air_json from airports where country=’GREENLAND’ order by name desc;
UNION DISTINCT
select name ,city, country, air_json from airports where name=’GOROKA’
union distinct
select name ,city, country, air_json from airports where name=’N/A’ and city=’N/A’
union distinct
select name ,city, country, air_json from airports where country=’GREENLAND’ order by name desc;