A database design question
OK, I have this design problem I've not been able to find a good answer for.
Let's say you're tracking customer pricing for a widget. Now, widget prices are volatile, changing weekly or even daily. And you need to be able to go back in time and say "on this day, we would have charged you 5.32 yaks for a widget". So, you set up a table that has a price, a "price start date" and a "price end date". When you quote a price, you need a date, and you can retrieve the price.
So, a create table would look like this:
Price_Start_Date datetime not null,
Price_End_Date datetime not null,
Price decimal(9,2)
For a non-expired price put in an arbitrary future date, and when the price expires update the expire date. That way you get to use between (and avoid funky "isnull" comparisons everywhere).
Here's the design problem: Put a real primary key (not a surrogate key) on that table.
I can think of some really ugly solutions (involving tables of every conceivable date and an interesect table). Or refining that going all OLAP on it and creating tables of years, months, days, and using an intersect table.
Anybody?
Legacy Comments
Jason Catlett
2004-05-26 |
re: A database design question Well you could put a primary key on it, but all that would do is make sure no one put duplicate entries in. What you really need is a check constraint or more likely an insert/update trigger that validates that the date range you are trying to put in does not overlap an existing date range for that same widget. |
rocketscientist
2004-05-26 |
re: A database design question That's kind of the whole point of the exercise. Your comment "make sure no one put duplicate entries in" isn't a trivial problem. A primary key on a date range that actually works to prevent overlapping ranges is not easy. By the way, I don't think a CHECK constraint would work because CHECK constraints can't validate data that way. A trigger on insert/update would work, but there's also not a good way to index that table for date range lookups, so the dupe check before any inserts would happen in N time, instead of log N time (table scan vs index seek). The primary key problem and the indexing problem are probably the same problem, so solving the slow trigger will probably result in indexing the table in a special way, and will probably solve the PK problem. |
Dewayne Christensen
2004-05-26 |
re: A database design question Lose the end date. Then to find the date for any point in time: <pre> select top 1 Price from Batman where Price_Start_Date <= @AsOf order by Price_Start_Date desc </pre> Then your primary key is on the start date only, and you've eliminated the chance of overlapping ranges. Think of the table as a continuous time line, with each record establishing the point on the line where the new price takes effect. You don't need to track where it ends because it ends where the next one starts. |
Dewayne Christensen
2004-05-26 |
re: A database design question Sorry. The first line should say "to find the PRICE for any"... |
rocketscientist
2004-05-26 |
re: A database design question Here's the counter example. If you have one record that starts today and ends at some arbitrarily out there future point, and another that starts tomorrow and ends at the same future point, you've got overlapping ranges. Stick different prices in them, and you'll get back two results. Granted that your "Top 1" in the query will return one result, but will it be the "correct" result? Murphy says no. I'm becoming more comfortable that the only way to prevent overlapping date ranges is to use a domain table of all possible dates and a price-date intersect table. Which is neither elegant nor ideal, but does solve the problem. |
Brett
2004-05-26 |
re: A database design question I disagree.... Key is Product and startdate Put a unique Index on Product and Enddate Keep the most products most current edn date null...can now only allow one... THEN use the TOP 1 query With WHERE EndDate IS NULL No question there.... |
Dewayne Christensen
2004-05-26 |
re: A database design question How do you define "correct" in the face of "arbitrary?" Ignoring databases, how would you explain to the sales clerks that something's $1 from Monday to forever, but $2 from Tuesday to forever? Which price are they to use on Wednesday? |
rocketscientist
2004-05-26 |
re: A database design question OK, Brett, you've almost got it. In your case, key is product and start date, and a unique key on Product and End Date. OK, so I have a contract to buy product id 1 with a start date today and end date 1 year from today and a cost of 10 yaks. Tomorrow, the contract was discovered to be in error and I get a product id 1 with a start date tomorrow and the end date a year from tomorrow, and a cost of <pinky gesture> 1 BILLION yaks </pinky gesture>. I can't just update the row, because I may have sold some widgets at 1, and I need to be able to re-construct that price. Obviously, I should change the end date of the first record...but I don't HAVE to, there is no constraint forcing me to fix the problem, or even telling me there is a problem. If I pull the price given a date of next Wednesday, which one do I get? The database supports putting that bad data in. And TOP 1 is not deterministic, I can change the result by messing with ascending/descinding sort orders on indexes. Your solution actually gets closer than anything else, but it isn't BETWEEN friendly, which would be nice. As a side note, I was playing with some stuff for this response...and NULL = NULL isn't true...unless you're inserting into a table with a unique index. That's non-orthoganal (sp?) behavior. I'm not saying it's not useful..but it's morally wrong anyway. I need to write up that rant later. |
Brett
2004-05-26 |
re: A database design question Well I guess I would say any process involved in managing the price must be involved im manging the dates... And what I'm saying, that as a business rule...(or is it a data rule?) that the productId that has an end date of null is THE most current price...there can only be 1 for that product. And of Course...null is not equal to anything, not even itself...even the discussion of null having the ability to be compared could be considered erroneous...how can you compare something the doesn't even exist...ahh existentialism.. And as far as between is concerned...if your setting a price today...you want the end date is null bit. I'd probably denormalize here and store the price in the sales table, but historically you could then say between...but as you point out you could have your overlap[ problem (which is why I would denormalize)...but you still use BETWEEN...and if you like use a GROUP BY with a MAX on price...people have short memories... Make sure they don't have a receipt.. USE Northwind GO CREATE TABLE PricingGuide ( ProductId int IDENTITY(1,1) NOT NULL , PruductName varchar(255) NOT NULL , PriceStartDate datetime NOT NULL DEFAULT (Getdate()) , PriceEndDate datetime , Price money , PRIMARY KEY (ProductId, PriceStartDate) ) CREATE UNIQUE INDEX PricingGuide_EndDateIX ON PricingGuide (ProductId, PriceEndDate) GO |
Brett
2004-05-26 |
re: A database design question Hey can't you edit these responses? [but you still use BETWEEN...] I meant to say... You still could use your between for history For the current price you need to use the Null concept |
DavidM
2004-05-26 |
re: A database design question Ahh.. the temporal data problem.. can you say 6th Normal Form!!!! Go out and buy Date, Darwen and Lorentzos book on TEMPORAL DATA AND THE RELATIONAL MODEL.. We do this type of design all the time.. We have a Start and End Date, EndDate is defaulted to the maximum date value 99991231.. and add non-overlap constraint http://weblogs.sqlteam.com/davidm/archive/2004/04/28/1296.aspx You hava a choice of candidate keys.. Price/ StartDate or Price/EndDate, plus the overlap constraint itself provides uniqueness.. Mark one of them as the primary key and add a unique constraint to the other... A bit of procedural logic is required to update the "old" datetime prior to the new price but you guarantee integrity, history queries are easy and the "current" value is always the row where EndDate is equal to the max date..We avoid NULLs at (just about) all cost.. Or you could build an extra table that just contains the history of prices and the "main" table has the current value.. Create a view that joins the two and the front end won't know the difference... |
graz
2004-05-26 |
re: A database design question In case anyone is curious the exchange is exactly 1 yak = 4 yak hooves for your standard 4 footed yak. Kind of a cruel way to get currency but what's a DBA to do. |
Brett
2004-05-27 |
re: A database design question >> EndDate is equal to the max date..We avoid NULLs at (just about) all cost.. << Why? You still then have to code soemthing What's the diff between Col1 IS NULL Or Col1 = '9999/12/31' ? |
DavidM
2004-05-27 |
re: A database design question You sure do Brett.. Not "much" difference, I sleep better at night though.. NULLS cause too much damage outside of the database. If you have an alternative to NULL I cannot see the point in using it... |
JeffS
2004-05-28 |
re: A database design question HR systems and accounting systems also need to track historical data; I have found this design to be best: Item EffectiveDate Value with Item/EffectiveDate as the primary key. then a VIEW can handle the ending date quite well: SELECT Item, Value, EffectiveDate as StartDate, (select Isnull(min(effectiveDate)-1,'12/31/2050') from Stuff B where Stuff.Item = B.Item and Stuff.EffectiveDate < B.EffectiveDate) as EndDate FROM Stuff or something like that ... I find that works quite well and you can have a normal constraint and it automatically adjusts to when your data changes since it's a dynamic view. |
rocketscientist
2004-05-28 |
re: A database design question I like DavidM's approach, but I worry about the performance cost of having to that lookup on inserts/updates. I think, however, that it's probably the best solution if either you don't want to make a date-range table, or if it's not practical to do so. On the other hand, I've been thinking more about the date-range table appraoch and it has some advantages. It's less complicated, it doesn't need UDF's and CHECK constraints, just nice simple foreign and primary/unique keys. In the case of a date being truly a date and not a date/time, a date-range table would have at most 3653 rows over 10 years (3 leap years in any given 10 year span is the maximum). The downside is that if you're tracking one hundred products, that would be as much as 3.652 million rows in the table. Given all three pieces of the PK (product id, start date id, end date id) I could do lookups into that table at ludicrous speed. Actually, if you can limit your date range to about 2 years instead of 10, you could do a great job of tracking down to the quarter-hour without the data getting too far out of control, especially if you restrict the hours range down to 12 hours a day, so if you were scheduling meetings or something, this probably wouldn't suck at all. |
deded
2005-12-31 |
re: A database design question swesded |