Saturday, 10 January 2015

Finding nth highest salary example and explanation

Let’s step through an actual example to see how the query above will actually execute step by step. Suppose we are looking for the 2nd highest Salary value in our table above, so our N is 2. This means that the query will look like this

SELECT *
FROM Employee Emp1
WHERE (1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)

You can probably see that Emp1 and Emp2 are just aliases for the same Employee table – it’s like we just created 2 separate clones of the Employee table and gave them different names.

Understanding and visualizing how the query above works

Let’s assume that we are using this data:
Employee
Employee IDSalary
3200
4800
7450
For the sake of our explanation, let’s assume that N is 2 – so the query is trying to find the 2nd highest salary in the Employee table. The first thing that the query above does is process the very first row of the Employee table, which has an alias of Emp1.
The salary in the first row of the Employee table is 200. Because the subquery is correlated to the outer query through the alias Emp1, it means that when the first row is processed, the query will essentially look like this – note that all we did is replace Emp1.Salary with the value of 200:
SELECT *
FROM Employee Emp1
WHERE (1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > 200)
So, what exactly is happening when that first row is processed? Well, if you pay special attention to the subquery you will notice that it’s basically searching for the count of salary entries in the Employee table that are greater than 200. Basically, the subquery is trying to find how many salary entries are greater than 200. Then, that count of salary entries is checked to see if it equals 1 in the outer query, and if so then everything from that particular row in Emp1 will be returned.

No comments:

Post a Comment