subtypes

This is as much a question as a post, so please, leave me some comments if you have the time.

In the current issue of SQL Sever Magazine, Michelle Poolet has an article about "Overlapping Subtypes."  I'm wondering what you all think about this.

The example she uses is very similar to my own case, so I'll describe it in the best way that I can:

  • I have a people table which of course contains basic information about people that are used in my application.

  • Each person might perform different "roles" within the application.  Such as "broker" or "client" or "sales rep."

  • Each person might perform 0, 1, or more roles at the same time.

The problem then, is how to store that data.  My solution has been to have a roles table and a peopleroles table.  The peopleroles table simply relates people to roles.  Any given "person" may have 0 or more entries in the peopleroles table.  I think this is pretty much the standard way to store this type of data.

However, Michelle mentions some drawbacks (such as performance, and query complexity) that lead her to suggest something very different: Adding columns to the people table, one for each role that a person might perform.

She also mentions another way, to add one column which would contain multi-values, so that you could parse that field to find people who are in certain roles.  I dislike this one a LOT, so lets talk about the other suggestion :)

Now, I'm not guru enough to state exactly WHY I don't feel comfortable with the "add a few columns to the people table" concept.  But, I thought that I had learned somewhere along the way that "it's generally better to build a solution that allows you to "add a row" when something changes (like a new role to track) than to "add a column" to my table.  Did I word that correctly?

Anyway, if I were to do it the way she suggests, certain parts of my app would be SO much better.  I'm seriously considering it.  I'd like to hear other people's comments.

What do YOU think?

 

Print | posted on Wednesday, October 29, 2003 8:32 AM

Feedback

# re: subtypes

left by Chad Humphries at 10/29/2003 9:59 AM Gravatar
I've used the multivalue field approach before and never again with that. To much of a mess.

I'm not found of adding columns for something such as that. I find the Seperate table and relationship table the best approach.

It lets you approach it as if the primary table is what it is, and if you want to add features to it that aren't in it's base info you can create 'addon' tables.

It does bring a little bit of the oop design ideas into the relational world, which depending on who you talk to is either great or horrible. Just my two cents.

# re: subtypes

left by Dave Verwer at 10/29/2003 10:10 AM Gravatar
For me, the answer to this question from case to case depending on what data you are planning to store and how static it is.

If you know that the number of roles your application will be dealing with is static or if adding a role would be a significant development task for other reasons then go for Michelle's method. There are a number of places in the application I am working on that store data in this way and while it may not be fully normalised, it is much easier to work with and I know I am not going to have major headaches with it because I have only used it in places where the number of fields/roles will only change on a version upgrade of the software.

However if you have a very large number of roles (I would say about 10-15 is a sensible maximum limit for flattening into one table) or if your application allows roles to be added dynamically then I would never recommend that method. Adding fields through code at runtime in a multi user system makes me cringe and it would never be something I could recommend. Also if you have (or are likely to have in the near future) more than 15 roles then it is going to make the flattened table difficult to read and use effectively.

So... in conclusion... It depends :)

# re: subtypes

left by Nicole Calinoiu at 10/29/2003 10:57 AM Gravatar
Ouch. This would fall under my personal "never, ever" list. <g>

Unless someone has come up with something brilliant that I've never encountered, there are only three basic approaches to handling subtypes:

1. Keep all the data in the same table, adding columns for attributes that are specific to each subtype (Michelle's suggestion). This is the simplest approach unless you actually care about data integrity, in which case you need to add check constraints to enforce the rules for population/non-population of each and every one of the subtype-specific columns, which can be a very nasty proposition. It also means that some pretty substantial design changes are required as new subtypes are added.

2. Give each subtype its own table. The problems here are:
a. Duplication of "shared" columns (and all their rules) between tables, which can pose a rather serious maintenance head-ache.
b. There is no common table, so unions will be necessary when treating the subtypes as a group. If the "shared" column definitions drift over time, this will cause problems beyond the obvious performance issues.

3. Keep the shared attributes in one table that defines the base type and give each subtype its own table for the attributes defined by the subtype. While this solution requires a bit more understanding of how to maintain the data integrity, it offers the greatest ease of maintenance of the three. It also maps quite nicely to the OOP inheritance model that is probably being used for handling the base/subtypes in the client application(s).

I've posted diagrams for approach #3 at http://groups.msn.com/BordeCal/documents.msnw?fc_p=%2FDbDesign%2FInheritance. The one important bit that might not be obvious from the diagrams is that each SubtypeN table has a check constraint to limit the TypeID column to the value for the represented subtype.

I've used this approach quite often and, while it's definitely far from perfect, I think it offers the best maintainability and extensibility of the three options.

# re: subtypes

left by AjarnMark at 10/29/2003 11:37 AM Gravatar
Ooh! Ooh! OOH! (Mr. Kotter!) I have an opinion on multi-values. Let me see... I put it somewhere... Oh yeah, here it is: Multi-Values are Evil Incarnate. See http://weblogs.sqlteam.com/markc/posts/288.aspx for more details.

I recoil at the thought of adding role fields to your table. This is, in part, because I have had to clean up a LOT of Access databases that had that sort of structure, and then they wanted to add "just one more" and also couldn't figure out why they were having so much trouble getting meaningful reports. My personal favorite of these was where all these extra fields were just Yes/No fields. Argh!

I don't know that this applies to your scenario, but a lot of people overlook it, so I'll throw it in for good measure: Remember that you can add attributes, in addition to the keys, to the junction table (peopleroles). Of course these are attributes that only apply to that person in that role.

I think the long term flexibility and maintainability of your current design outweighs the concern over query complexity and performance, until you can absolutely prove to me with testing results that we can't overcome these barriers to reach an acceptable level of performance.

# re: subtypes

left by ken ambrose at 10/29/2003 4:08 PM Gravatar
It is almost unbelievable to me that a "professional" magazine would run an article that flaunts the basic theories that have been the cornerstone of most if not all of the improvements in computerized database effectiveness for the last few decades. Anyone who back-pedals like that will likely pay a huge price in increased maintenance expense and lowering of data quality in their results..

# re: subtypes

left by robvolk at 10/29/2003 7:18 PM Gravatar
Well, Ken and Mark absolutely read my mind and I agree 110% with them. It's really sad that SQL Server Mag actually published that, they're gonna get all kinds of fallout.

I think it helps to look at how SQL Server manages roles, users, and role memberships itself. The sysmembers table is structured in the same way as the peopleroles table mentioned earlier. If there really was a drawback to performance, as Michelle claims (she certainly didn't test it) I would think the SQL Server team would've determined it by now and altered the structure.

# re: subtypes

left by Tim at 10/30/2003 12:47 AM Gravatar
I would go with people -> peopleroles <- roles unless peopleroles contains attributes that are only relevant to specific combinations of people and roles.

If this were the case, I would go for the super type / sub type design.

# re: subtypes

left by Richard at 10/30/2003 11:03 AM Gravatar
(posted this to my weblog, linked to my name above)

Travis is right. That nagging feeling in his head about his approach is this little thing called FIRST NORMAL FORM, one of the primary foundational design basis for relational databases. If she doesn't want to have to deal with that oh-so-complicated join between Users and UserRoles, she should create an indexed view joining the two.

A comment here mentioned that there would be "fallout" if they actually printed this. There should be. Do these "professional" magazines have any sort of peer review process? I think anyone who took database theory 101 in college would have been able to debunk this in a jiffy.

Beyond theory, though, there are two very practical reasons for handling this data the "right" way from the beginning:

1. My experience is that databases and applications are organic in nature. Today you only have users vs. Administrators. Next week someone asks for a "Visitor/Read-only" role. Two months from now you need an "Uber-Administrator." Over the lifetime of the database, you simply cannot predict the number of roles that will surface, so once you determine that more than one is needed, you need a joining table.

2. Having a join table between Users and Roles allows you to store attributes of the relation, just as the date the user was given that role, whether the role is still active, notes about why they are in that role, etc. While you may not see the need for this sort of metadata during the initial design, don't discount a future need by cheating in the design.

# re: subtypes

left by sitka at 11/8/2003 3:41 PM Gravatar
I once used an int column titled roles on a people tables and then a roles tables had
1, role1
2, role2
4, role3
8, role4
16,role5
....

I don't know why I did this.
Comments have been closed on this topic.