Thinking outside the box

Patron Saint of Lost Yaks
posts - 203, comments - 734, trackbacks - 4

My Links

Advertisement

News

Archives

Post Categories

Median and weighted median

Yesterday Joe Celko posted on microsoft.public.sqlserver.programming about how to write an elegant query for Weighted Median. He managed to get the correct results but always ended up with ugly code.
Joe had a feeling an elegant query existed but was not seeing it. Anyway, since Celko's books have helped me in the past, I thought I should help him now.

Consider this sample data


DECLARE @Foo TABLE
        (
            x INT NOT NULL
        )

INSERT  @Foo
VALUES  (1),
        (2),
        (2),
        (3),
        (3),
        (3)

The most common approach to calculate the median value I have seen is

SELECT  AVG(1.0E * x)
FROM    (
            SELECT  x,
                    ROW_NUMBER() OVER (ORDER BY x DESC) AS a,
                    ROW_NUMBER() OVER (ORDER BY x) AS b
            FROM    @Foo
        ) AS d
WHERE   b - a BETWEEN -1 AND 1

It's a great method and it works in almost all cases. Yes, in almost all cases. There are circumstances where SQL Server will not give the correct result. The reason for me writing this blog post is that I suddenly got this result in return from inner query.

x a b  a-b
1 6 1    5
2 4 2    2
2 5 3    2
3 1 4   -3
3 2 5   -3
3 3 6   -3

As you can see, the difference calculated by a-b suddenly doesn't match!
How can we overcome this behaviour? It's not that hard. See this query.

-- Peso Median
SELECT  AVG(1.0E * x)
FROM    (
            SELECT  x,
                    2 * ROW_NUMBER() OVER (ORDER BY x) - COUNT(*) OVER () AS y
            FROM    @Foo
        ) AS d
WHERE   y BETWEEN 0 AND 2

Yes it works! And now how to do weighted median? Well, we follow the same approach and write this piece of code.

-- Peso Weighted Median
SELECT  SUM(1.0E * y) / SUM(1.0E * t)
FROM    (
            SELECT  SUM(x) OVER (PARTITION BY x) AS y,
                    2 * ROW_NUMBER() OVER (ORDER BY x) - COUNT(*) OVER () AS z,
                    COUNT(*) OVER (PARTITION BY x) AS t
            FROM    @Foo
        ) AS d
WHERE   z BETWEEN 0 AND 2

There you go!

Print | posted on Wednesday, September 16, 2009 9:56 PM | Filed Under [ Optimization SQL Server 2008 Algorithms SQL Server 2005 ]

Feedback

Gravatar

# re: Median and weighted median

The problem with the first query is you need to break the ties. Why doesn't your table have a key?
9/16/2009 10:34 PM | Adam Machanic
Gravatar

# re: Median and weighted median

Great question. I answered a question Celko wrote on the newsgroup located at microsoft.public.sqlserver.programming

"Elegant query for weighted median?"
And there is only 6 sample records and not thousands as in this connect issue

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=483540
9/16/2009 10:56 PM | Peso
Gravatar

# re: Median and weighted median

Keeping in mind I am a completely accidental DBA/Developer trying to learn, I do something similar, finding weighted averages, for determining First Pass Yield in our production facility. I basically took the Excel Way, using SumProduct, and applied it to my data, and it seems to work.

Breaking my procedure down to this example data set, just to show the methodology,

DECLARE @Foo TABLE
(
x INT NOT NULL
)
declare @y int
INSERT into @Foo
Select (1)
union all
Select (2)
union all
Select (2)
union all
Select (3)
union all
Select (3)
union all
Select (3)

set @y = (select count(x) from @Foo)

Select SUM(x * @y) / SUM(@y)
from @Foo

--Result is 2
9/21/2009 3:48 AM | Dyfhid
Gravatar

# re: Median and weighted median

You have to be aware of INTEGER math. In your example, the average is 2 1/3 (2.33333)


set @y = (select count(x) from @Foo)

Select 1.0E * SUM(x * @y) / SUM(@y)
from @Foo
9/21/2009 8:47 AM | Peso
Gravatar

# re: Median and weighted median

The normal median is 2.5 and the weighted median is 2.6
9/21/2009 8:49 AM | Peso
Gravatar

# re: Median and weighted median

This new and continued the Jeremy Scott usage of "tongue reconstruction method", all of the previous "Tai di bear" pattern must not a stranger to this selection of Chinese people more familiar with the pandas as a design based on without white and delicate colors easily to create a cute style, it is thanks to our national treasures born beautiful, simple black and white has enough.
9/15/2011 9:15 AM | Adidas Originals
Gravatar

# re: Median and weighted median

Thanks .. It works perfect!
9/26/2012 12:55 PM | Samuel
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET