Peter Larsson Blog

Patron Saint of Lost Yaks

Median and weighted median

Yesterday Joe Celko posted on microsoft.public.sqlserver.programming about how to write an elegant query for Weighted Median. He managed to get the correct results but always ended up with ugly code.
Joe had a feeling an elegant query existed but was not seeing it. Anyway, since Celko's books have helped me in the past, I thought I should help him now.

Consider this sample data


DECLARE @Foo TABLE
        (
            x INT NOT NULL
        )

INSERT  @Foo
VALUES  (1),
        (2),
        (2),
        (3),
        (3),
        (3)

The most common approach to calculate the median value I have seen is

SELECT  AVG(1.0E * x)
FROM    (
            SELECT  x,
                    ROW_NUMBER() OVER (ORDER BY x DESC) AS a,
                    ROW_NUMBER() OVER (ORDER BY x) AS b
            FROM    @Foo
        ) AS d
WHERE   b - a BETWEEN -1 AND 1

It's a great method and it works in almost all cases. Yes, in almost all cases. There are circumstances where SQL Server will not give the correct result. The reason for me writing this blog post is that I suddenly got this result in return from inner query.

x a b  a-b
1 6 1    5
2 4 2    2
2 5 3    2
3 1 4   -3
3 2 5   -3
3 3 6   -3

As you can see, the difference calculated by a-b suddenly doesn't match!
How can we overcome this behaviour? It's not that hard. See this query.

-- Peso Median
SELECT  AVG(1.0E * x)
FROM    (
            SELECT  x,
                    2 * ROW_NUMBER() OVER (ORDER BY x) - COUNT(*) OVER () AS y
            FROM    @Foo
        ) AS d
WHERE   y BETWEEN 0 AND 2

Yes it works! And now how to do weighted median? Well, we follow the same approach and write this piece of code.

-- Peso Weighted Median
SELECT  SUM(1.0E * y) / SUM(1.0E * t)
FROM    (
            SELECT  SUM(x) OVER (PARTITION BY x) AS y,
                    2 * ROW_NUMBER() OVER (ORDER BY x) - COUNT(*) OVER () AS z,
                    COUNT(*) OVER (PARTITION BY x) AS t
            FROM    @Foo
        ) AS d
WHERE   z BETWEEN 0 AND 2

There you go!

Legacy Comments


Adam Machanic
2009-09-16
re: Median and weighted median
The problem with the first query is you need to break the ties. Why doesn't your table have a key?

Peso
2009-09-16
re: Median and weighted median
Great question. I answered a question Celko wrote on the newsgroup located at microsoft.public.sqlserver.programming

"Elegant query for weighted median?"
And there is only 6 sample records and not thousands as in this connect issue

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=483540

Dyfhid
2009-09-21
re: Median and weighted median
Keeping in mind I am a completely accidental DBA/Developer trying to learn, I do something similar, finding weighted averages, for determining First Pass Yield in our production facility. I basically took the Excel Way, using SumProduct, and applied it to my data, and it seems to work.

Breaking my procedure down to this example data set, just to show the methodology,

DECLARE @Foo TABLE
(
x INT NOT NULL
)
declare @y int
INSERT into @Foo
Select (1)
union all
Select (2)
union all
Select (2)
union all
Select (3)
union all
Select (3)
union all
Select (3)

set @y = (select count(x) from @Foo)

Select SUM(x * @y) / SUM(@y)
from @Foo

--Result is 2

Peso
2009-09-21
re: Median and weighted median
You have to be aware of INTEGER math. In your example, the average is 2 1/3 (2.33333)


set @y = (select count(x) from @Foo)

Select 1.0E * SUM(x * @y) / SUM(@y)
from @Foo

Peso
2009-09-21
re: Median and weighted median
The normal median is 2.5 and the weighted median is 2.6

Adidas Originals
2011-09-15
re: Median and weighted median
This new and continued the Jeremy Scott usage of "tongue reconstruction method", all of the previous "Tai di bear" pattern must not a stranger to this selection of Chinese people more familiar with the pandas as a design based on without white and delicate colors easily to create a cute style, it is thanks to our national treasures born beautiful, simple black and white has enough.

Samuel
2012-09-26
re: Median and weighted median
Thanks .. It works perfect!