That's right boys and girls, it's what you've been waiting for all weekend: Another edition of the
mailbag!
Damian writes:
Hi
I have a tricky SQL question that I have been trawling the net and
workmates to find an answer. We are accessing a real time proprietary
database that approximates classic SQL in syntax. The GUI we are using does
not allow anything to be done effectively on the front end. It all has
to happen in the query.
Trades flow in and we would like to sum them on the fly with only a
query. We need to sum based on the change in price.
eg
for this series of trades (ID is ordered but non-contiguous)
Trade ID Price Volume
1 4 10
3 4 20
6 5 15
7 4 20
should produce
price volume
4 30
5 15
4 20
so it sums the volume on a change in price. but note if the same price
appears again this is not included in the total for the first instance
of that price but in the total for a new instance of that price.
It is so simple but the concensus is it is impossible in pure SQL. At
first I did not think it was but now I tend to agree. we cannot change
the tables in the database as it is all locked away and front end
processing is unworkable.
would you agree?
thanks
Damian
It is a little tricky, and it is not terribly efficient, but it can be done in pure set-based SQL if I am understanding your requirements correctly.
All you need is the technique shown
here.
(As far as I know, until I hear otherwise, I'll take credit for coming up with this method! )
Here the code:
-- set up sample table:
create table Trades (TradeID int primary key, Price int, Volume int)
insert into Trades
select 1,4,10 union all
select 3,4,20 union all
select 6,5,15 union all
select 7,4,20
-- your solution:
select
min(tradeID) as StartID, max(TradeID) as EndID, Price, sum(Volume) as TotalVolume
from
(
select t1.TradeID, t1.Price, t1.Volume,
(select count(*) from Trades t2
where t2.TradeID < t1.TradeID and t2.Price <> t1.Price) as RunGroup
from Trades t1
) x
group by
Price, RunGroup
order by
min(tradeID)
-- results:
StartID EndID Price TotalVolume
----------- ----------- ----------- -----------
1 3 4 30
6 6 5 15
7 7 4 20
(3 row(s) affected)
As I mentioned, this works, but it is not very efficient; depending on how much data you have, it might run very slowly. The culprit is that correlated sub-query that calculates the "RunGroup". You can also calculate the RunGroup using either of these techniques if they are more efficient for your data:
-- option 2
select *, (select min(tradeID) from Trades t2
where t2.TradeID > t1.TradeID and t2.Price != t1.Price) as RunGroup
from Trades t1
-- option 3
select *, (select max(TradeID) from Trades t2
where t2.TradeID < t1.TradeID and t2.Price <> t1.Price) as RunGroup
from Trades t1
For this small sample table, the execution plan is the same for all 3.
Also, note that I am tracking the "RunGroups" for all of the data in your table, but if it can be partitioned -- for example, by Customer or Client -- then that will make it much more efficient. To add a partition on, say, Client, you would alter the code like this:
select
Client, min(tradeID) as StartID, max(TradeID) as EndID, Price, sum(Volume) as TotalVolume
from
(
select t1.Client, t1.TradeID, t1.Price, t1.Volume,
(select count(*) from Trades t2
where t2.TradeID < t1.TradeID and t2.Price <> t1.Price
and t1.Client = t2.Client) as RunGroup
from Trades t1
) x
group by
Client, Price, RunGroup
order by
Client, min(tradeID)
I hope this helps! if not, let me know.