UNION vs UNION DISTINCT vs UNION ALL

Nidhi Gupta
4 min readMar 11, 2021

--

INTRODUCTION

A good way to think about JOINs is extending our dataset horizontally. A UNION, then, is a way of combining data vertically. Unions combine data sets with the same structure; they simply create a table with rows from both tables.

UNION operators can combine the result-set of two or more SELECT statements as long as: Each SELECT statement within UNION must have the same number of columns.

The columns must also have similar data types. The columns in each SELECT statement must also be in the same order.

JOIN vs UNION

JOIN
JOIN combines data from many tables based on a matched condition between them.
Number of columns selected from each table may not be the same.
It combines data into new columns.
Datatypes of corresponding columns selected from each table can be different.
It may not return distinct columns.

UNION
SQL combines the result-set of two or more SELECT statements.
It combines data into new rows
Number of columns selected from each table should be the same.
Datatypes of corresponding columns selected from each table should be the same.
It returns distinct rows.

UNION

UNION operator is used for combining result sets from more than one SELECT statement into one result set.

Any duplicate rows from the results of the SELECT statements are eliminated.

The UNION operator works under two conditions:

1) SELECT queries MUST return a similar number of queries.

2) Data types of all corresponding columns must be compatible.

The UNION operator is normally used to combine data from related tables that have not been normalized perfectly.

Example:-

Syntax

SELECT expression_1, expression_2, … expression_n

FROM tables

[WHERE condition(s)]

UNION

SELECT expression_1, expression_2, … expression_n

FROM tables

[WHERE condition(s)];

UNION DISTINCT

UNION DISTINCT operator is used for combining DISTINCT result sets from more than one SELECT statement into one result set.

Any duplicate rows from the results of the SELECT statements are eliminated so, here all the entries in the result set are distinct.

The UNION DISTINCT operator works under two conditions:

1) SELECT queries MUST return a similar number of queries.

2) Data types of all corresponding columns must be compatible.

The UNION DISTINCT operator is normally used to combine data from related tables that have not been normalized perfectly.

Example:-

Syntax

SELECT expression_1, expression_2, … expression_n

FROM tables

[WHERE condition(s)]

UNION DISTINCT

SELECT expression_1, expression_2, … expression_n

FROM tables

[WHERE condition(s)];

UNION ALL

UNION ALL operator is used for combining result sets from more than one SELECT statement into one result set.

Any duplicate rows from the results of the SELECT statements are not eliminated.

The UNION ALL operator works under two conditions:

1) SELECT queries MUST return a similar number of queries.

2) Data types of all corresponding columns must be compatible.

The UNION ALL operator is normally used to combine data from related tables that have not been normalized perfectly.

Example:-

Syntax

SELECT expression_1, expression_2, … expression_n

FROM tables

[WHERE condition(s)]

UNION ALL

SELECT expression_1, expression_2, … expression_n

FROM tables

[WHERE condition(s)];

PERFORMANCE BASED COMPARISON

An interesting distinction is the presence of UNION versus UNION ALL. Of the two, UNION is the more “intelligent” operation:

if identical rows exist in both SELECT queries, a UNION will know to only give us one row to avoid duplicates. On the other hand,

UNION ALL does return duplicates: this results in a faster query and could be useful for those who want to know what is in both SELECT statements.

UNION DISTINCT is used in scenarios when we need unique based on the where conditions in the query.

IMPLEMENTATION

Let’s consider a scenario for using union, union distinct and union all.We have a table called airports as added the table structure:

CREATE TABLE public.airports

(

icao_code character(4) COLLATE pg_catalog.”default”,

iata_code character(3) COLLATE pg_catalog.”default”,

name character varying(50) COLLATE pg_catalog.”default”,

city character varying(50) COLLATE pg_catalog.”default”,

country character varying(50) COLLATE pg_catalog.”default”,

lat_deg integer,

lat_min integer,

lat_sec integer,

lat_dir character(1) COLLATE pg_catalog.”default”,

lon_deg integer,

lon_min integer,

lon_sec integer,

lon_dir character(1) COLLATE pg_catalog.”default”,

altitude integer,

lat_decimal double precision,

lon_decimal double precision,

id integer NOT NULL DEFAULT nextval(‘airports_id_seq’::regclass),

air_json jsonb,

CONSTRAINT airports_pkey PRIMARY KEY (id)

)

UNION

select name ,city,country,air_json from airports where name=’GOROKA’

union

select name ,city,country,air_json from airports where name=’N/A’ and city=’N/A’

union

select name ,city,country,air_json from airports where country=’GREENLAND’ order by name desc;

UNION ALL

select name ,city,country,air_json from airports where name=’GOROKA’

union all

select name ,city, country, air_json from airports where name=’N/A’ and city=’N/A’

union all

select name ,city, country, air_json from airports where country=’GREENLAND’ order by name desc;

UNION DISTINCT

select name ,city, country, air_json from airports where name=’GOROKA’

union distinct

select name ,city, country, air_json from airports where name=’N/A’ and city=’N/A’

union distinct

select name ,city, country, air_json from airports where country=’GREENLAND’ order by name desc;

--

--

Nidhi Gupta

Azure Data Engineer 👨‍💻.Heading towards cloud technologies expertise✌️.