By The Way … DISTINCT is not a function …
Have you ever seen (or written) code like this:
select distinct(employeeID), salary
from salaryhist
from salaryhist
That compiles and executes without returning any errors. I've seen that attempted many times over the years, and of course people think DISTINCT is "broken" and "not working" because they see multiple rows for each employeeID. "But I asked for only distinct employeeIDs!" they say.
Well, the DISTINCT has nothing to do with the EmployeeID column; it is not a function that accepts arguments! It is just a tag that you can put after the word SELECT to indicate that you want only distinct combinations of all columns in the result set returned.
That syntax is accepted because (employeeID) is just an expression, a reference to a column, which happens to be surrounded by parenthesis. For example, you could write:
select distinct (employeeID), (salary)
from salaryhist
from salaryhist
or:
select (employeeID), (salary)
from salaryhist
from salaryhist
or even:
select distinct ((employeeID)), ((salary))
from salaryhist
from salaryhist
Nothing is indicating that DISTINCT should be "operating" on the employeeID column; it is just a column reference in the SELECT clause that happens to be surrounded by parenthesis.
So, remember:
- DISTINCT always operates on all columns in the final result
- DISTINCT is not a function that accepts a column as an argument
select employeeID, max(salary) as MaxSalary
from salaryhist
group by employeeID
Notice that now we are getting distinct EmployeeID values, and the max salary per EmployeeID. This will truly return exactly one row per EmployeeID, unlike the initial DISTINCT example. And, in this case, MAX() is indeed a function that accepts and acts upon an argument -- unlike DISTINCT!from salaryhist
group by employeeID
see also:
- Some SELECTs will never return 0 rows -- regardless of the criteria
- A handy but little-known SQL function: NULLIF()
- Is it a String Literal or an Alias?
- Passing an Array or Table Parameter to a Stored Procedure
- SELECT * FROM TABLE -- except for these columns
- In SQL, it's a Case Expression, *not* a Case Statement
- Returning Random Numbers in a SELECT statement
- SELECT DISTINCT and ORDER BY
Legacy Comments
Adam Machanic
2007-10-12 |
re: By The Way ... DISTINCT is not a function ... Ugh, yes--seen this several times. Even worse, something like: SELECT DISTINCT(EmployeeId), MAX(Salary) FROM SalaryHist GROUP BY EmployeeId |
Rick O
2007-10-12 |
re: By The Way ... DISTINCT is not a function ... ... except when DISTINCT is inside an aggregate function: SELECT COUNT(DISTINCT n), COUNT(n), SUM(DISTINCT n), SUM(n) FROM ( SELECT 1 AS n UNION ALL SELECT 3 AS n UNION ALL SELECT 3 AS n ) AS a Returns (2, 3, 4, 7). |
Jeff
2007-10-12 |
re: By The Way ... DISTINCT is not a function ... Rick -- DISTINCT in that case is still not a function; it is a tag or a flag on the aggregate function indicating to only count or sum distinct values. |