SQL Join is one of the most important and expensive SQL operations and requires deep understanding from database engineers to write efficient SQL queries. From database engineers’ perspective, understanding how the JOIN operation works helps them to optimize them for efficient execution. This article, explains the JOIN operations supported in the open-source distributed computing engine: Presto SQL.
Almost all well-known databases support the following five types of joins: Cross Join, Inner Join, Left Join, Right Join, and Full Join. This article walks through these five types of joins from the end user’s perspective and finally explains two more internal joins which are not obvious to the end user.
Types of Joins in Presto
Cross Join
Cross Join returns the Cartesian product of both tables without any conditions. In other words, every single row from the left table will be joined with every single row from the right table. Cross join is more expensive than other joins as it does Cartesian product between two tables.
Inner Join
Inner join joins two tables under a condition. Every row from a table will be joined with every row from the other table only when the given condition is satisfied. In Presto SQL, INNER JOIN
, JOIN
and ,
separated tables with a WHERE
clause all treated as an inner join. If the query contains comma-separated tables with where clause as given below, it will be read as a Cross Join with a filter applied after the join operation and internally rewritten by a Presto optimizer to the inner join query.
Left Outer Join
Like the inner join, the Left Outer join joins every row from the left table with matching rows from the right table. If no match is found for a row from the left table, it will be completed with null values filled in columns from the right table. In Presto SQL the keyword OUTER
is optional in the LEFT OUTER JOIN
operation. In other words LEFT JOIN
and LEFT OUTER JOIN
mean the same.
Right Outer Join
The right outer join joins every row from the right table with the left table similar to the Left Outer Join. If no match is found for a row from the right table, it will be completed with null values for columns from the left table. In Presto SQL the keyword OUTER
is optional in the RIGHT OUTER JOIN
operation. In other words RIGHT JOIN
and RIGHT OUTER JOIN
mean the same.
Full Outer Join
Full Outer Join can be seen as the combination of Left Outer Join and Right Outer Join. It joins every row from each side with matching rows from the other side. If no match is found for a row from the left table, it will be completed with null values for columns from the right table. Similarly, if there is no match for a row from the right side, it will be completed with null values for the columns from the left table. In Presto SQL the keyword OUTER
is optional in the FULL OUTER JOIN
operation. In other words FULL JOIN
and FULL OUTER JOIN
mean the same.
In addition to the above-mentioned join types, there are two join types named: Semi Join and Anti Join. These two joins are used to support some other SQL operations which an end user might not consider as a join.
Semi Join
Semi Join evaluates the join criteria as any other join criteria but only returns the records from one table if there is a match. For example, the IN/EXISTS operations are performed using a semi-join internally. There are two variants of semi-join: Left Semi Join and Right Semi Join. As the name suggests, Left Semi Join evaluates the condition and returns rows from the left table if there is a match for those rows. Similarly, the Right Semi Join returns only the matching rows from the right table. The following query using the IN predicate is mapped to a Semi Join operation internally in the Presto database engine.
As you can see, there are two tables involved in the SQL query but we want the rows only from the left table.
Anti Join
Anti Join is another implicit join to return rows from a table when the join criteria do not match. It is similar to Semi Join in returning only rows from one table but the Join criteria are opposite to the Semi Join. For example, the following NOT IN query can be evaluated using an Anti Join.
Though Anti Join can be seen as a different join type, internally Presto treats it as a Semi Join by negating the join criteria. Not only the Semi Join but any join operation can also be composed using other joins. For example, a left outer join can be rewritten as a right outer join by reordering the tables. The flexibility of rewriting a join into another join allows database engineers to do several cost-based optimizations on the logical plan built from the submitted SQL query. Even for end users, if the database cannot perform cost-based optimizations and join reordering automatically due to statistics limitations and configurations, the knowledge of how to rewrite a naive join operation into an optimal form can save a few minutes to hours of execution time.
More technical details about how Presto executes these joins internally will be covered in another article. If you have any questions regarding join types or if you like to share your feedback, please comment below.