Non-Deterministic UDF's – Custom Aggregate and Cumulative Functions
User Defined Functions (UDF’s) are just about my favorite thing in SQL 2000. I thought the possibilities to be endless, until I realized that they were designed to be completely deterministic: each call to a UDF with the same arguments should always return the same value, and during the evaluation of a UDF it cannot make any changes to the state of the database. This implementation decision makes perfect sense, but I do wish it were possible to maintain a state between UDF calls.
The other day, I came across this thread, which has reminded me of something that I had forgotten about: A UDF can call extended stored procedures. And the sp_OACreate stored procedure can create an object, and return a reference to it. And UDF’s can also call sp_OAGetProperty and sp_OASetProperty to get and set properties to an object. And that object’s state is maintained between UDF calls, as long as you keep passing a reference to the same object. The reference is simply stored in an integer value. Thus, it appears we might just have a mechanism in place that will allow us to retain values between UDF calls. Why is this important? Keep reading ….
Let’s make use of the same RegExp object used in the thread and do some testing just to see what we can come up with.
First, let’s create a very basic UDF:
CREATE FUNCTION dbo.MakeObj()
RETURNS integer AS
BEGIN
DECLARE @obj integer
EXECUTE sp_OACreate 'VBScript.RegExp', @obj OUTPUT
EXECUTE sp_OASetProperty @obj, 'Pattern', ''
RETURN @obj
END
This function takes no arguments, and returns a reference through an integer to the RegExp object created. The “Pattern” property is defaulted to an empty string.
A couple of notes: First off, I am not presenting any error checking in this article, to keep things short. Secondly, I am using a RegExp object just because it is fairly lightweight, almost everyone should have it installed, and has a generic unvalidated text property (“Pattern”). A full implemention of the ideas presented here would be best served by writing a very small ActiveX object that is easily createable and has a property or two you can set and then read. But, to keep things simple, let’s just use the RegExp object for now.
Next, let’s create another UDF:
CREATE FUNCTION dbo.Concatenate(@obj integer,@Value varchar(1000))
RETURNS varchar(1000) AS
BEGIN
DECLARE @Result varchar(1000);
EXECUTE sp_OAGetProperty @obj, 'Pattern', @Result OUTPUT
SET @Result = @Result + @Value
EXECUTE sp_OASetProperty @obj, 'Pattern', @Result
RETURN @Result
END
This function accepts a regExp object and a varchar() as parameter. It then gets the “Pattern” property of the object, appends the varchar() to it, changes the “Pattern” property to the new value, and then returns the result.
After creating this function, run the following in query analyzer:
declare @a int;
set @a = dbo.MakeObj()
select dbo.Concatenate(@a,'1')
select dbo.Concatenate(@a,'2')
select dbo.Concatenate(@a,'3')
select dbo.Concatenate(@a,'4')
Look at the results, and notice that each subsequent call to the function appends another character to the previous result. Hmmmm…. So, out of curiousity, what happens if we execute this (using the Northwind database):
declare @a int;
set @a = dbo.MakeObj()
select EmployeeID, dbo.Concatenate(@a, LastName + ', ')
from northwind..Employees
Very interesting. Each row returned from the Employees table has a growing string of all of the names in the table – something normally not easily done with a single SELECT! We are close to implementing a custom CONCATENATE aggregate function.
But, it’s not quite right – we want to return a concatenated list of values for a particular group, such as a list of customers, per city. The data returned doesn’t quite give us that result – is there a way we can get there? Can we essentially write a custom, generic concatenation aggregate function?
Our goal is to set up our function so that it can take on this basic format, like any other aggregate function:
Select value, CustomAggFunction(x)
From table
Group by Value
What happens if we try out our CONCATENATE function in this way? Because the Employee table is so small, let’s try it with the Customers table, and attempt to create a list of customers per city:
declare @a int;
set @a = dbo.MakeObj()
select city, dbo.Concatenate(@a, CompanyName + ', ')
from northwind..customers
group by city
We get:
Column 'northwind..customers.CompanyName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Doesn’t work … which is to be expected when you think about it. We can wrap an existing function like MIN() or MAX() around our function to avoid this error message, but try it out and you will see it doesn’t return the desired results.
A little more thought tells us why this won’t work, and also that we never wrote any mechanism to ever reset the concatenated list anyway … it will keep growing and growing. Have we hit a dead end?
Well, let’s create another UDF:
CREATE FUNCTION dbo.LastResult(@obj integer)
RETURNS sql_variant AS
BEGIN
DECLARE @value sql_variant
EXECUTE sp_OAGetProperty @obj, 'Pattern', @Value OUTPUT
EXECUTE sp_OASetProperty @obj, 'Pattern', ''
RETURN @value
END
This function does two things: it returns the current value stored in the “Pattern” property of the object, and then sets that property to an empty string, effectively resetting the concatenated value.
It doesn’t seem too useful – since each call to it resets the object back to a default value, it appears to defeat the purpose of maintaining a “state” between calls. But, it will allow us to achieve the result we are looking for in our aggregate function! We need to call our CONCATENATE() function within a MIN() or a MAX() – it doesn’t matter which – and then include this new function in the SELECT list of our query. The CONCAT function will be called for each row in the resultset as the grouping is done, and the LASTRESULT function is called only when a new group occurs in the final resultset. Thus, the latest value is returned for that group, and that UDF also resets the total for the next group. This works because of the manner in which a GROUP’ed query is evaluated: For each group, gather all the rows needed to evaluate the aggregate functions, and then once for each group evaluate the other (non-aggregate) expressions in the SELECT list.
Fire this one up in QA:
declare @a int;
set @a = dbo.MakeObj()
select city, dbo.LastResult(@a), MIN(dbo.Concatenate(@a, CompanyName + ', '))
from northwind..customers
group by city
Holy smokes, it works!!!! This query actually performs an aggregate calculation and returns the summarized results only! Again, I cannot comment on performance at this point or even recommend this approach, but it surely is an interesting concept.
You can actually have multiple, generic aggregate functions in any query – as long as you have 1 object created per function. It is a little odd-looking in that you must also calculate the MIN() or MAX() of the function alongside the true result you wish to return, but it returns the desired results.
This leads to other ideas:
create function dbo.RunningTotal(@ValueObj integer, @Value money,
@ResetObj integer, @ResetValue sql_variant )
returns money
as
begin
declare @Result money;
declare @tmp varchar(1000);
declare @LastReset sql_variant;
execute sp_OAGetProperty @ValueObj, 'Pattern', @tmp output
if @ResetObj is not null
begin
execute sp_OAGetProperty @ResetObj, 'Pattern', @LastReset output
if convert(varchar(1000), @LastReset) !=
convert(varchar(1000), @ResetValue)
set @result = @value
else
set @result = convert(numeric, '0' + @tmp) + @value
execute sp_OASetProperty @ResetObj, 'Pattern', @ResetValue
end
else
set @result = convert(numeric, '0' + @tmp) + @value
set @tmp = convert(varchar(1000), @result)
execute sp_OASetProperty @ValueObj, 'Pattern', @tmp
return @result
end
The above function is not an aggregate function, but rather a cumulative function. It takes a numeric value and accumulates a running total from call to call. It has two forms – a simple form and a grouped form. Let’s look at the simple form first.
The simple form requires an object to be passed, and a value. The value is then added (numerically this time) to the value stored in that object, and then the resulting value is returned by the function. The running total is accumulated for the entire resultset. To achieve this result, simply leave the last two arguments as NULL for now. This is essentially the same, then, as our CONCATENATE function -- there's no reset mechanism in place and the values just keep accumulating. As mentioned, we are using the RegExp object’s “Pattern” property to store our values, so we need to do some conversions from text to numeric values along the way. A more polished implementation of this technique would use a custom object with a numeric property.
Take a look at the following results:
declare @a int;
set @a = dbo.MakeObj();
select customerID, companyName, dbo.RunningTotal(@a,1,null,null)
from northwind..customers
And also:
declare @a int;
set @a = dbo.MakeObj()
select orderID, quantity, dbo.RunningTotal(@a,quantity,null,null)
from northwind..[order details]
Very interesting – we have established row numbers for our resultset (this will make the Oracle users happy !) and we have created a running total as well.
More useful might be if we can reset these values periodically – i.e., a running total over groups of records. To do so, we’d have to store the current group’s value in the object from call to call of the UDF, and in each call pass in the current row’s group value. Well, our current RegExp object that we are playing with only has one property that we can make use of, so as a work-around I have allowed for this function to accept two objects – one to store the value, and one to store the last group’s value. Ideally, these would both be stored in 1 custom written object, but this works fine for now.
So, the other two arguments in our running total UDF are the “group reset” object’s reference, and the current “group reset” value from the query. So, you would create two RegExp objects, and for each call you pass in the arguments in this order:
1. the first RegExp object – used to store the running total
2. the value you wish to add to the running total
3. the second RegExp object – used to keep track of group info
4. the value in your query that determines when groups are reset
So, to close things up, let’s assume we wish to perform a running total of the quantity field from the Order Details tables in Northwind, and reset the running total for each OrderID:
declare @val int, @reset int;
select @val = dbo.MakeObj(), @reset = dbo.MakeObj()
select orderId, quantity, dbo.RunningTotal(@val, quantity, @reset, orderID)
from northwind..[order details]
order by orderID, quantity
@val is the object holding the running total, quantity is the value we are totaling on, @reset is the object holding the “reset” group’s last value, and OrderID is the value we wish to reset our groups on. Note the ORDER BY clause, which has now become very important to ensure our values are calculated correctly.
A few more disclaimers: you should explicitly destroy your object when you are done with them, and of course this method is at this point purely academic. Many DBA’s do not allow users to execute the sp_OA extended stored procedures on production servers.
But hopefully, you found this interesting! I am not sure about performance at this point -- I suspect that none of this will be especially efficient -- and as mentioned this really should be done with a custom lightweight object as opposed to commandeering RegExp, but if nothing else this should present some fun ideas to play with.
I welcome any and all feedback !!
Legacy Comments
DavidM
2004-06-03 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions That is gold Jeff! Well thought out and nicely written... It is a pity about the permissions though.... |
Damian
2004-06-04 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Nice one Jeff Would you like to publish this on the front page of SQLTeam ? |
JeffS
2004-06-04 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Sure, Damian -- that would be awesome. Please post it up there. I realized after I posted this I should send it on to you guys as well. Thanks! |
Lee Dise
2004-06-07 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Excellent! I'm seeing some of the holes in my knowledge here because my background as a developer did not include developing in the Windows environment. This article gives me a feel for the kinds of things that can be accomplished when poking around outside the DBMS. Thanks, Jeff! For simple concatenations, I like to perform them as follows (e.g., collect the columns in 'sysobject', in colid order): DECLARE @string VARCHAR (8000) , @delim VARCHAR (2) SELECT @string = '' , @delim = '' SELECT @string = @string + @delim + name , @delim = ', ' FROM syscolumns WHERE id = OBJECT_ID ('sysobjects') ORDER BY colid PRINT @string You could play a similar game with a running total. I've been told that this method is "highly unreliable", but I've never seen it fail. I do agree it would be nice to be able to do this in a function call that would be usable by any authenticated user (not just the ones with Nietschean Superman permissions). |
JeffS
2004-06-07 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Lee -- thanks for the feedback. The methods you have posted do indeed work reliably and are quite efficient, but they do not allow for returning sets of rows. For example, you will not be able to return a running total or row numbers since only 1 single value can be returned. |
Lee Dise
2004-06-08 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions > For example, you will not be able to return a running total or row numbers since only 1 single value can be returned. Hence the beauty of the elusive pervasive function. Once again, your article was extremely enlightening for me. Thanks, man! |
Rob
2004-06-21 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Interesting. I'm looking for an OR() Aggregate |
JeffS
2004-06-22 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions you would use the exact technique presented to create an OR aggregate, but there's also easier ways depending on what you are trying to do. do you need a bitwise OR, or a boolean OR? a boolean OR/AND is very easy ... you just use MIN or MAX and a CASE statement. |
Gary
2004-07-11 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions I need a bitwise or. I have an int column that holds bit flags and I wish to return rows with a bitwise or agregating the selection. Is there a simple way to do this? |
Gary
2004-07-11 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions I would prefer to not have to create and install a custom ActiveX control with the solution. Is there an elegant way to do an aggregate logical or with just T-Sql? Thanks! |
JeffS
2004-07-11 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions see my latest post for some answers to your questions about an aggregate OR function. |
seventhnight
2004-11-04 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions I was working on kselvia's puzzle with minimum minute interval between records, and my final solution pointed me in an interesting direction. I have readapted that solution into a concatenating function for strings. It does require that you use it in an update, but with temp tables that is usually acceptable. Using a similar, but slightly different function, the running sums and other types of results could also be done. This is the string concatenation example: CREATE FUNCTION dbo.concatStr ( @curStr varchar(4000), @newStr varchar(100) ) RETURNS varchar(4000) AS BEGIN Return (isnull(@curStr+',','')+@newStr) END Go Declare @myTable table (id int, name varchar(100), list varchar(4000)) Insert Into @myTable Select 1, 'Bill', null Union Select 1, 'Bob', null Union Select 2, 'Ken', null Union Select 2, 'Ryan', null Declare @curStr varchar(4000), @key varchar(4000) Update @myTable Set @curStr = list = case when @key = convert(varchar,id) then dbo.concatStr(@curStr,name) else name end, @key = id From @myTable Select id, list From @myTable A where len(list) = (select max(len(list)) from @myTable where id = A.id) go Drop function dbo.concatStr |
chirurgia estetica
2005-12-12 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions . |
Concatonate with Grouping
2006-02-15 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Here is the concatonate modified for grouping... CREATE FUNCTION [dbo].[Concatenate](@ValueObj integer, @Value varchar(1000),@ResetObj integer, @ResetValue sql_variant) RETURNS varchar(1000) AS BEGIN DECLARE @Result varchar(1000); DECLARE @tmp varchar(1000); DECLARE @LastReset sql_variant; EXECUTE sp_OAGetProperty @ValueObj, 'Pattern', @tmp OUTPUT IF @ResetObj IS NOT NULL BEGIN EXECUTE sp_OAGetProperty @ResetObj, 'Pattern', @LastReset output IF convert(varchar(1000), @LastReset) != convert(varchar(1000), @ResetValue) SET @Result = @Value ELSE SET @Result = @tmp + @Value EXECUTE sp_OASetProperty @ResetObj, 'Pattern', @ResetValue END ELSE SET @Result = @tmp + @Value SET @tmp = convert(varchar(1000), @result) EXECUTE sp_OASetProperty @ValueObj, 'Pattern', @tmp RETURN @Result END |
chris
2006-03-15 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Best solution I ever have seen!!! Is it pssoible to perform a dense_rank with ms-sqlserver 2000 with this solution?? Many thanks |
Sergey
2006-08-12 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Update to Jeff S post... Use SET @Result = '' and not SET @Result = ', ' CREATE FUNCTIONS EmpList(@DeptID int) RETURNS VARCHAR(8000) AS BEGIN DECLARE @result VARCHAR(8000) SET @Result = '' SELECT @Result = @Result + ', ' + EmpID FROM Emps WHERE DeptID = @DeptID RETURN (SUBSTRING(@Result,3,8000)) END |
Peter Savage
2007-07-30 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Unfortunately the sp_OA mechanisms seem to fail with larger result sets. Probably some optimization or parallelism... With the ResetProperty/Value version I see only one value in thec oncatenated string when there are more than about 350 rows in the result set. It works fine up to that point. |
Dustin
2007-07-31 |
re: Non-Deterministic UDF's -- Custom Aggregate and Cumulative Functions Just glancing through the article, I think the basic behavior you see is a result of "aggregate concatenation queries", and it is not necessary to employ an extended stored procedure or a UDF to see this behavior. Such queries are also unreliable, because is some cases, the query optimizer only executes the select clause against the last row returned from the query, as determined by the order by clause. See http://support.microsoft.com/default.aspx?scid=kb;EN-US;q287515 . |