## Median and weighted median

Yesterday Joe Celko posted on microsoft.public.sqlserver.programming about how to write an elegant query for Weighted Median. He managed to get the correct results but always ended up with ugly code.

Joe had a feeling an elegant query existed but was not seeing it. Anyway, since Celko's books have helped me in the past, I thought I should help him now.

Consider this sample data

DECLARE @Foo TABLE

(

x INT NOT NULL

)

INSERT @Foo

VALUES (1),

(2),

(2),

(3),

(3),

(3)

The most common approach to calculate the median value I have seen is

SELECT AVG(1.0E * x)

FROM (

SELECT x,

ROW_NUMBER() OVER (ORDER BY x DESC) AS a,

ROW_NUMBER() OVER (ORDER BY x) AS b

FROM @Foo

) AS d

WHERE b - a BETWEEN -1 AND 1

x a b a-b

1 6 1 5

2 4 2 2

2 5 3 2

3 1 4 -3

3 2 5 -3

3 3 6 -3

As you can see, the difference calculated by a-b suddenly doesn't match!

How can we overcome this behaviour? It's not that hard. See this query.

-- Peso Median

Yes it works! And now how to do weighted median? Well, we follow the same approach and write this piece of code.

-- Peso Weighted Median

SELECT SUM(1.0E * y) / SUM(1.0E * t)

FROM (

SELECT SUM(x) OVER (PARTITION BY x) AS y,

2 * ROW_NUMBER() OVER (ORDER BY x) - COUNT(*) OVER () AS z,

COUNT(*) OVER (PARTITION BY x) AS t

FROM @Foo

) AS d

WHERE z BETWEEN 0 AND 2

There you go!

Print | posted on Wednesday, September 16, 2009 9:56 PM | Filed Under [ Optimization SQL Server 2008 Algorithms SQL Server 2005 ]

## Feedback

## # re: Median and weighted median

The problem with the first query is you need to break the ties. Why doesn't your table have a key?## # re: Median and weighted median

Great question. I answered a question Celko wrote on the newsgroup located at microsoft.public.sqlserver.programming"Elegant query for weighted median?"

And there is only 6 sample records and not thousands as in this connect issue

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=483540

## # re: Median and weighted median

Keeping in mind I am a completely accidental DBA/Developer trying to learn, I do something similar, finding weighted averages, for determining First Pass Yield in our production facility. I basically took the Excel Way, using SumProduct, and applied it to my data, and it seems to work.Breaking my procedure down to this example data set, just to show the methodology,

DECLARE @Foo TABLE

(

x INT NOT NULL

)

declare @y int

INSERT into @Foo

Select (1)

union all

Select (2)

union all

Select (2)

union all

Select (3)

union all

Select (3)

union all

Select (3)

set @y = (select count(x) from @Foo)

Select SUM(x * @y) / SUM(@y)

from @Foo

--Result is 2

## # re: Median and weighted median

You have to be aware of INTEGER math. In your example, the average is 2 1/3 (2.33333)set @y = (select count(x) from @Foo)

Select 1.0E * SUM(x * @y) / SUM(@y)

from @Foo

## # re: Median and weighted median

The normal median is 2.5 and the weighted median is 2.6## # re: Median and weighted median

This new and continued the Jeremy Scott usage of "tongue reconstruction method", all of the previous "Tai di bear" pattern must not a stranger to this selection of Chinese people more familiar with the pandas as a design based on without white and delicate colors easily to create a cute style, it is thanks to our national treasures born beautiful, simple black and white has enough.## # re: Median and weighted median

Thanks .. It works perfect!