Median and weighted median
Yesterday Joe Celko posted on microsoft.public.sqlserver.programming about how to write an elegant query for Weighted Median. He managed to get the correct results but always ended up with ugly code.
Joe had a feeling an elegant query existed but was not seeing it. Anyway, since Celko's books have helped me in the past, I thought I should help him now.
Consider this sample data
DECLARE @Foo TABLE
(
x INT NOT NULL
)
INSERT @Foo
VALUES (1),
(2),
(2),
(3),
(3),
(3)
The most common approach to calculate the median value I have seen is
SELECT AVG(1.0E * x)
FROM (
SELECT x,
ROW_NUMBER() OVER (ORDER BY x DESC) AS a,
ROW_NUMBER() OVER (ORDER BY x) AS b
FROM @Foo
) AS d
WHERE b - a BETWEEN -1 AND 1
x a b a-b
1 6 1 5
2 4 2 2
2 5 3 2
3 1 4 -3
3 2 5 -3
3 3 6 -3
As you can see, the difference calculated by a-b suddenly doesn't match!
How can we overcome this behaviour? It's not that hard. See this query.
-- Peso Median
Yes it works! And now how to do weighted median? Well, we follow the same approach and write this piece of code.
-- Peso Weighted Median
SELECT SUM(1.0E * y) / SUM(1.0E * t)
FROM (
SELECT SUM(x) OVER (PARTITION BY x) AS y,
2 * ROW_NUMBER() OVER (ORDER BY x) - COUNT(*) OVER () AS z,
COUNT(*) OVER (PARTITION BY x) AS t
FROM @Foo
) AS d
WHERE z BETWEEN 0 AND 2
There you go!
Legacy Comments
Adam Machanic
2009-09-16 |
re: Median and weighted median The problem with the first query is you need to break the ties. Why doesn't your table have a key? |
Peso
2009-09-16 |
re: Median and weighted median Great question. I answered a question Celko wrote on the newsgroup located at microsoft.public.sqlserver.programming "Elegant query for weighted median?" And there is only 6 sample records and not thousands as in this connect issue https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=483540 |
Dyfhid
2009-09-21 |
re: Median and weighted median Keeping in mind I am a completely accidental DBA/Developer trying to learn, I do something similar, finding weighted averages, for determining First Pass Yield in our production facility. I basically took the Excel Way, using SumProduct, and applied it to my data, and it seems to work. Breaking my procedure down to this example data set, just to show the methodology, DECLARE @Foo TABLE ( x INT NOT NULL ) declare @y int INSERT into @Foo Select (1) union all Select (2) union all Select (2) union all Select (3) union all Select (3) union all Select (3) set @y = (select count(x) from @Foo) Select SUM(x * @y) / SUM(@y) from @Foo --Result is 2 |
Peso
2009-09-21 |
re: Median and weighted median You have to be aware of INTEGER math. In your example, the average is 2 1/3 (2.33333) set @y = (select count(x) from @Foo) Select 1.0E * SUM(x * @y) / SUM(@y) from @Foo |
Peso
2009-09-21 |
re: Median and weighted median The normal median is 2.5 and the weighted median is 2.6 |
Adidas Originals
2011-09-15 |
re: Median and weighted median This new and continued the Jeremy Scott usage of "tongue reconstruction method", all of the previous "Tai di bear" pattern must not a stranger to this selection of Chinese people more familiar with the pandas as a design based on without white and delicate colors easily to create a cute style, it is thanks to our national treasures born beautiful, simple black and white has enough. |
Samuel
2012-09-26 |
re: Median and weighted median Thanks .. It works perfect! |