RocketScientist's Miscellaneous Ramblings

## A database design question

OK, I have this design problem I've not been able to find a good answer for.

Let's say you're tracking customer pricing for a widget. Now, widget prices are volatile, changing weekly or even daily. And you need to be able to go back in time and say "on this day, we would have charged you 5.32 yaks for a widget". So, you set up a table that has a price, a "price start date" and a "price end date". When you quote a price, you need a date, and you can retrieve the price.

So, a create table would look like this:

Price_Start_Date datetime not null,
Price_End_Date datetime not null,
Price decimal(9,2)

For a non-expired price put in an arbitrary future date, and when the price expires update the expire date. That way you get to use between (and avoid funky "isnull" comparisons everywhere).

Here's the design problem: Put a real primary key (not a surrogate key) on that table.

I can think of some really ugly solutions (involving tables of every conceivable date and an interesect table). Or refining that going all OLAP on it and creating tables of years, months, days, and using an intersect table. Anybody?

Print | posted on Wednesday, May 26, 2004 10:47 AM |

## #re: A database design question

Well you could put a primary key on it, but all that would do is make sure no one put duplicate entries in.

What you really need is a check constraint or more likely an insert/update trigger that validates that the date range you are trying to put in does not overlap an existing date range for that same widget.

5/26/2004 11:33 AM | Jason Catlett

## #re: A database design question

That's kind of the whole point of the exercise. Your comment "make sure no one put duplicate entries in" isn't a trivial problem. A primary key on a date range that actually works to prevent overlapping ranges is not easy.

By the way, I don't think a CHECK constraint would work because CHECK constraints can't validate data that way.

A trigger on insert/update would work, but there's also not a good way to index that table for date range lookups, so the dupe check before any inserts would happen in N time, instead of log N time (table scan vs index seek).

The primary key problem and the indexing problem are probably the same problem, so solving the slow trigger will probably result in indexing the table in a special way, and will probably solve the PK problem.

5/26/2004 12:33 PM | rocketscientist

## #re: A database design question

Lose the end date. Then to find the date for any point in time:

<pre>
select top 1 Price
from Batman
where Price_Start_Date <= @AsOf
order by Price_Start_Date desc
</pre>

Then your primary key is on the start date only, and you've eliminated the chance of overlapping ranges.

Think of the table as a continuous time line, with each record establishing the point on the line where the new price takes effect. You don't need to track where it ends because it ends where the next one starts.

5/26/2004 12:56 PM | Dewayne Christensen

## #re: A database design question

Sorry. The first line should say "to find the PRICE for any"...
5/26/2004 1:03 PM | Dewayne Christensen

## #re: A database design question

Here's the counter example. If you have one record that starts today and ends at some arbitrarily out there future point, and another that starts tomorrow and ends at the same future point, you've got overlapping ranges. Stick different prices in them, and you'll get back two results. Granted that your "Top 1" in the query will return one result, but will it be the "correct" result? Murphy says no.

I'm becoming more comfortable that the only way to prevent overlapping date ranges is to use a domain table of all possible dates and a price-date intersect table. Which is neither elegant nor ideal, but does solve the problem.
5/26/2004 1:06 PM | rocketscientist

## #re: A database design question

I disagree....

Key is Product and startdate

Put a unique Index on Product and Enddate

Keep the most products most current edn date null...can now only allow one...

THEN use the TOP 1 query With WHERE EndDate IS NULL

No question there....

5/26/2004 1:21 PM | Brett

## #re: A database design question

How do you define "correct" in the face of "arbitrary?" Ignoring databases, how would you explain to the sales clerks that something's \$1 from Monday to forever, but \$2 from Tuesday to forever? Which price are they to use on Wednesday?
5/26/2004 1:26 PM | Dewayne Christensen

## #re: A database design question

OK, Brett, you've almost got it. In your case, key is product and start date, and a unique key on Product and End Date. OK, so I have a contract to buy product id 1 with a start date today and end date 1 year from today and a cost of 10 yaks. Tomorrow, the contract was discovered to be in error and I get a product id 1 with a start date tomorrow and the end date a year from tomorrow, and a cost of <pinky gesture> 1 BILLION yaks </pinky gesture>. I can't just update the row, because I may have sold some widgets at 1, and I need to be able to re-construct that price. Obviously, I should change the end date of the first record...but I don't HAVE to, there is no constraint forcing me to fix the problem, or even telling me there is a problem. If I pull the price given a date of next Wednesday, which one do I get? The database supports putting that bad data in. And TOP 1 is not deterministic, I can change the result by messing with ascending/descinding sort orders on indexes.

Your solution actually gets closer than anything else, but it isn't BETWEEN friendly, which would be nice.

As a side note, I was playing with some stuff for this response...and NULL = NULL isn't true...unless you're inserting into a table with a unique index. That's non-orthoganal (sp?) behavior. I'm not saying it's not useful..but it's morally wrong anyway. I need to write up that rant later.
5/26/2004 3:00 PM | rocketscientist

## #re: A database design question

Well I guess I would say any process involved in managing the price must be involved im manging the dates...

And what I'm saying, that as a business rule...(or is it a data rule?) that the productId that has an end date of null is THE most current price...there can only be 1 for that product.

And of Course...null is not equal to anything, not even itself...even the discussion of null having the ability to be compared could be considered erroneous...how can you compare something the doesn't even exist...ahh existentialism..

And as far as between is concerned...if your setting a price today...you want the end date is null bit.

I'd probably denormalize here and store the price in the sales table, but historically you could then say between...but as you point out you could have your overlap[ problem (which is why I would denormalize)...but you still use BETWEEN...and if you like use a GROUP BY with a MAX on price...people have short memories...

Make sure they don't have a receipt..

USE Northwind
GO

CREATE TABLE PricingGuide (
ProductId int IDENTITY(1,1) NOT NULL
, PruductName varchar(255) NOT NULL
, PriceStartDate datetime NOT NULL DEFAULT (Getdate())
, PriceEndDate datetime
, Price money
, PRIMARY KEY (ProductId, PriceStartDate)
)

CREATE UNIQUE INDEX PricingGuide_EndDateIX ON PricingGuide (ProductId, PriceEndDate)
GO
5/26/2004 3:20 PM | Brett

## #re: A database design question

Hey can't you edit these responses?

[but you still use BETWEEN...]

I meant to say...

You still could use your between for history

For the current price you need to use the Null concept

5/26/2004 3:26 PM | Brett

## #re: A database design question

Ahh.. the temporal data problem.. can you say 6th Normal Form!!!! Go out and buy Date, Darwen and Lorentzos book on TEMPORAL DATA AND THE RELATIONAL MODEL..

We do this type of design all the time.. We have a Start and End Date, EndDate is defaulted to the maximum date value 99991231.. and add non-overlap constraint http://weblogs.sqlteam.com/davidm/archive/2004/04/28/1296.aspx

You hava a choice of candidate keys.. Price/ StartDate or Price/EndDate, plus the overlap constraint itself provides uniqueness.. Mark one of them as the primary key and add a unique constraint to the other...

A bit of procedural logic is required to update the "old" datetime prior to the new price but you guarantee integrity, history queries are easy and the "current" value is always the row where EndDate is equal to the max date..We avoid NULLs at (just about) all cost..

Or you could build an extra table that just contains the history of prices and the "main" table has the current value.. Create a view that joins the two and the front end won't know the difference...

5/26/2004 6:20 PM | DavidM

## #re: A database design question

In case anyone is curious the exchange is exactly 1 yak = 4 yak hooves for your standard 4 footed yak. Kind of a cruel way to get currency but what's a DBA to do.
5/26/2004 11:02 PM | graz

## #re: A database design question

>> EndDate is equal to the max date..We avoid NULLs at (just about) all cost.. <<

Why?

You still then have to code soemthing

What's the diff between

Col1 IS NULL

Or Col1 = '9999/12/31'

?

5/27/2004 11:53 AM | Brett

## #re: A database design question

You sure do Brett..

Not "much" difference, I sleep better at night though..

NULLS cause too much damage outside of the database. If you have an alternative to NULL I cannot see the point in using it...
5/27/2004 9:45 PM | DavidM

## #re: A database design question

HR systems and accounting systems also need to track historical data; I have found this design to be best:

Item
EffectiveDate
Value

with Item/EffectiveDate as the primary key. then a VIEW can handle the ending date quite well:

SELECT Item, Value, EffectiveDate as StartDate,
(select Isnull(min(effectiveDate)-1,'12/31/2050')
from Stuff B
where Stuff.Item = B.Item and
Stuff.EffectiveDate < B.EffectiveDate) as EndDate
FROM Stuff

or something like that ... I find that works quite well and you can have a normal constraint and it automatically adjusts to when your data changes since it's a dynamic view.
5/28/2004 9:05 AM | JeffS

## #re: A database design question

I like DavidM's approach, but I worry about the performance cost of having to that lookup on inserts/updates. I think, however, that it's probably the best solution if either you don't want to make a date-range table, or if it's not practical to do so.

On the other hand, I've been thinking more about the date-range table appraoch and it has some advantages. It's less complicated, it doesn't need UDF's and CHECK constraints, just nice simple foreign and primary/unique keys. In the case of a date being truly a date and not a date/time, a date-range table would have at most 3653 rows over 10 years (3 leap years in any given 10 year span is the maximum). The downside is that if you're tracking one hundred products, that would be as much as 3.652 million rows in the table.

Given all three pieces of the PK (product id, start date id, end date id) I could do lookups into that table at ludicrous speed. Actually, if you can limit your date range to about 2 years instead of 10, you could do a great job of tracking down to the quarter-hour without the data getting too far out of control, especially if you restrict the hours range down to 12 hours a day, so if you were scheduling meetings or something, this probably wouldn't suck at all.

5/28/2004 10:40 AM | rocketscientist

## #re: A database design question

swesded
12/31/2005 4:57 AM | deded
Comments have been closed on this topic.