First of the bat I'd like to apologies to Ryan Rinaldi
for my assumption about his intelligence based on one comment of his in my blog 4 Developers and a DBA
As always, Lavos, has provided the karma check that I needed so I decided to visit his blog.
He seems like an intelligent guy, and to assume otherwise based on one comment is rude. So once again sorry Ryan.
The relational model is surprisingly simple. In fact, so easy, I taught my wife the basics in about 30 minutes.
It went something like this...
A type is set of values, and can be just about anything.. string, number, fingerprint, image, DNA. The term domain is its synonym.
In RM, we define the type (and in proper implementation of RM, the operators as well), and hence the set of values, that define the type.
A type also comes with a name (Attribute). So a "column" in RM is a attribute, domain pair.
The only other requirement is that an attribute, domain pair must take one and only one
value of that type (1NF)
"Columns" are "grouped" into sets (No 2 columns have the same name) and this is what we call the "heading".(The DDL we create in database speak)
One or a combination of the columns in the "group" must
uniquely identify the "row".
When we give this set of columns a name we have made a relation (table).
Values that enter the table must conform to the heading and must be unique according to the key defined.
That's it! I obviously left out the algebra, calculus and functional dependency because quite frankly she would fall a sleep.
To see a more thorough definition from the master RM educator (Date) see here Link
From my experience, people completely miss the point of domains (types).
Which brings us to the topic of "Look Up" tables....
The primary use of a "look up" table is used to restrict values that may be entered in a column via a Foreign Key constraint.
This is exactly what a domain does. It only allows values that we defined in our type.
If we take the "Look Up" table approach to its logical conclusion, why don't you have a "Look Up" table for your integer columns or for that matter every column?
An example is Gender. Without getting too pedantic (chromosomal minorities) but including "Unknown" values (anything but NULL) there are really only 3 values. Male, Female and Unknown.
So in SQL speak it would be declared thus Gender VARCHAR(7) NOT NULL DEFAULT('Unknown') CHECK(Gender in ('Male', 'Female', 'Unknown'))
Obviously a trivial example, but the thought process is the same for any domain.
A non trivial example is street types. We have a type with over 25 values in a CHECK constraint.
Last month we added 1 more. (Governments.. do they have anything better to do?)
Now a lot of you will be saying that it is far easier to add a row to a table than to alter a check constraint. And you would probably be right but I suspect you would let a user do that too. I don't trust users and when the shit hits the fan I am responsible for the integrity and accuracy of that data.
How many typos (which effectively give duplicates) do you see in some of your "Look Up" tables? So I don't give them a chance to mess with domains values, they can suggest them and after review they can be entered.
The decision is obviously on a model by model basis but here is a quick and dirty guide as when to use a CHECK constraint instead of a look up table: when the set of values are not changing frequently (I leave it up to you to define "frequent") or the word "type" is used to describe the attribute name.
Lastly, what about the hard coded values in your application? This shows a lack of imagination. For a start, all domains should be documented and in our case we create "virtual" views for each of them in a database named (what else?) Domains as well as its definition in SQL in a table named DomainDefinitions that we use for rapidly creating new tables.
CREATE VIEW Genders
SELECT CAST('Male' AS VARCHAR(7)) AS Gender
SELECT CAST('Female' AS VARCHAR(7)) AS Gender
SELECT CAST('Unknown' AS VARCHAR(7)) AS Gender
The .NET Dataset has been a real boon for us in this area as we simply load all our domains into it once and then flip a switch to repopulate when any changes are made to these domains...
It is obviously a little bit more work, but the payoffs include better performance (CPU versus IO), better data accuracy (mainly bought about by processes) and a feel good factor that you are staying as close to theory as your DBMS will allow.
EDIT: Fixed a dodgy "a" tag.. Why does Opera handle it and not IE?
| posted on Wednesday, February 18, 2004 11:23 AM