Mladen Prajdić Blog

Blog about stuff and things and stuff. Mostly about SQL server and .Net

Entity-Attribute-Value (EAV) model with SQL Server 2005 xml datatype

I've seen this technique a few months ago on a local MS event called NT Conference
presented by Slovenian MVP Dejan Sarka.

However i haven't seen him blog about it yet (a shame) so after seeing this post
by co blogger David Moloney here on SQLTeam blogs in this post about EAV i decided to give this little
thing a go by myself.
I've also chosen the same DB model as david for comparison:

A Client can have many “end user” defined data elements.
For the purpose of this example we will choose 5 data elements
to focus on the modelling aspect
Age – Integer greater than -1
Weight – Numeric greater than zero
Handed – string set “Left” or “Right”
Quit Date – NULL marker indicating that the Client hasn't quit or a valid Date
Has Insurance – Bit type

What i haven't seen in the presentation at NT conference and what suprised me is the
datetime datatype (could it acctually be anything else for a change? :)))

As stated in this MSDN document:

In SQL Server 2005, all types derived from xs:date, xs:time, and xs:dateTime are required to have time zones. Sqltypes:datetime and sqltypes:smalldatetime are two of these types. However, the SQL datetime and smalldatetime types do not have time zones. This is because the pattern facets for sqltypes:datetime and sqltypes:smalldatetime do not allow for time zones. As a result, SQL Server does not accept sqltypes:datetime or sql:smalldatetime values. Although you can reference sqltypes:datetime and sqltypes:smalldatetime in user-defined schemas, you cannot validate XML documents that contain values of these types. This makes them unusable.

So this means that we have to work around this limitation.
So i faked dates as varchars regex'd to accept:
yyyymmdd hh:mi:ss:mmm or yyyy-mm-dd hh:mi:ss:mmm
This is because of the sorting. Also miliseconds (:mmm) part is optional in both cases.
This is a very simple regex that just accepts numbers and doesn't do any datetime validation.
Validation should be done before this point in UI or DB access layer.

With some indexes on the xml column this becomes a preety powerfull and usefull model.

Code:

IF OBJECT_ID('ClientAttributes') IS NOT NULL
    drop table ClientAttributes
go
-- problem with datetime datatype look above URL
-- create a table with desired attributes
create table ClientAttributes
(
    Age int not NULL check( Age > -1) ,
    Weight numeric(10,2) not NULL check( Weight > 0),
    Handed varchar(5) not null check( Handed in ('left', 'right')),
    QuitDate datetime null, 
    HasInsurance bit not null default(1)
)

go – create an XML schema from the table with client attributes DECLARE @mySchema xml

– this is the default schema that gets created from the ClientAttributes table – however the check constraints aren't converted SET @mySchema = (SELECT * FROM ClientAttributes FOR XML AUTO, ELEMENTS, XMLSCHEMA('ClientAttributes')) – see the schema that gets auto-created select @mySchema

– add the check constraints (by hand) with <xsd:restriction> attribute – Quitdate datetime see explanation above SET @mySchema = '<xsd:schema xmlns:schema="ClientAttributes" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" targetNamespace="ClientAttributes" elementFormDefault="qualified"> <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" /> <xsd:element name="ClientAttributes"> <xsd:complexType> <xsd:sequence> <xsd:element name="Age" > <xsd:simpleType> <xsd:restriction base="sqltypes:int"> <xsd:minInclusive value="0" /> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="Weight"> <xsd:simpleType> <xsd:restriction base="sqltypes:numeric"> <xsd:totalDigits value="10" /> <xsd:fractionDigits value="2" /> <xsd:minExclusive value="0" /> </xsd:restriction>
</xsd:simpleType> </xsd:element> <xsd:element name="Handed"> <xsd:simpleType> <xsd:restriction base="sqltypes:varchar" sqltypes:localeId="1060" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"> <xsd:maxLength value="5" /> <xsd:pattern value="left|right"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="QuitDate" minOccurs="0"> <xsd:simpleType> <xsd:restriction base="sqltypes:varchar" sqltypes:localeId="1060" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"> <xsd:maxLength value="24" />
<xsd:pattern value="\d\d\d\d-?\d\d-?\d\d \d\d:\d\d:\d\d(:\d{3})?" /> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="HasInsurance" type="sqltypes:bit" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>'

CREATE XML SCHEMA COLLECTION ClientAttributesSchema AS @mySchema go

– we don't need the table anymore drop table ClientAttributes go IF OBJECT_ID('Client') IS NOT NULL drop table Client

CREATE TABLE Client ( ClientID INT NOT NULL PRIMARY KEY , FirstName VARCHAR(50) NOT NULL , LastName VARCHAR(50) NOT NULL , AttribXML xml(ClientAttributesSchema) – xml column with schema ) go

IF OBJECT_ID('dbo.ClientAttributeValuesValidation') IS NOT NULL drop function dbo.ClientAttributeValuesValidation go – Add the constraint in the form of the function that – uses the ClientAttributes namespace to validate input, – because we can't put XML methods in check constraint CREATE FUNCTION dbo.ClientAttributeValuesValidation(@chkcol xml) RETURNS nvarchar(4000) AS BEGIN RETURN @chkcol.value('namespace-uri((/*)[1])','nvarchar(4000)') END GO – Add the constraint to the Client table ALTER TABLE Client WITH NOCHECK ADD CONSTRAINT chk_ClientAttributeValueValidation CHECK (dbo.ClientAttributeValuesValidation(AttribXML) = 'ClientAttributes'); GO

– Insert OK INSERT INTO Client (ClientID, FirstName, LastName, AttribXML) SELECT 1, 'First Name 1', 'Last Name 1', N'<ClientAttributes xmlns="ClientAttributes"> <Age>22</Age> <Weight>10.2</Weight> <Handed>left</Handed> <QuitDate>2006-10-01 00:00:00</QuitDate>
<HasInsurance>1</HasInsurance> </ClientAttributes>'
– different date format - Insert OK INSERT INTO Client (ClientID, FirstName, LastName, AttribXML) SELECT 2, 'First Name 2', 'Last Name 3', N'<ClientAttributes xmlns="ClientAttributes"> <Age>22</Age> <Weight>10.2</Weight> <Handed>left</Handed> <QuitDate>20061001 00:00:00</QuitDate>
<HasInsurance>1</HasInsurance> </ClientAttributes>'

– no date meaning it's null - Insert OK INSERT INTO Client (ClientID, FirstName, LastName, AttribXML) SELECT 3, 'First Name 3', 'Last Name 3', N'<ClientAttributes xmlns="ClientAttributes"> <Age>23</Age> <Weight>10.2</Weight> <Handed>right</Handed> <HasInsurance>1</HasInsurance> </ClientAttributes>'

– Handed doesn't satisfy the constraint - Insert fails INSERT INTO Client (ClientID, FirstName, LastName, AttribXML) SELECT 4, 'First Name 4', 'Last Name 4', N'<ClientAttributes xmlns="ClientAttributes"> <Age>1</Age> <Weight>1</Weight> <Handed>ffdg</Handed> <QuitDate>20061001 00:00:00</QuitDate> <HasInsurance>1</HasInsurance> </ClientAttributes>'

select * from Client go drop table Client drop function dbo.ClientAttributeValuesValidation drop XML SCHEMA COLLECTION ClientAttributesSchema

– Info about XML schemas can be retrived with these statments: – Schema collection info SELECT XSC.* FROM sys.xml_schema_collections XSC WHERE XSC.name = 'ClientAttributesSchema' – Schema collection namespaces info SELECT * FROM sys.xml_schema_collections XSC JOIN sys.xml_schema_namespaces XSN on XSC.xml_collection_id = XSN.xml_collection_id WHERE XSC.name = 'ClientAttributesSchema' – Schema collection Components info SELECT * FROM sys.xml_schema_collections XSC JOIN sys.xml_schema_components XSM on XSC.xml_collection_id = XSM.xml_collection_id WHERE XSC.name = 'ClientAttributesSchema'

 

In the end the xml datatype in sql server starts to make sense :)

Legacy Comments


rockmoose
2006-10-16
re: Entity-Attribute-Value (EAV) model with SQL Server 2005 xml datatype
Left-hand path of data design?

Mladen
2006-10-17
re: Entity-Attribute-Value (EAV) model with SQL Server 2005 xml datatype
emm... huh??

PP
2009-01-22
re: Entity-Attribute-Value (EAV) model with SQL Server 2005 xml datatype
This is great alternative. What would happen if the schema was to change. Let's suppose you start off with a number of attributes (a1, a2, a3, a4). a1 was dropped and a2 had its type changed. Would there be any loss of data.

Can you also explain why you had to create the client attributes table prior to creating the sql server schema?

Thanks a bunch!

Pedro

Mladen
2009-01-23
re: Entity-Attribute-Value (EAV) model with SQL Server 2005 xml datatype
if you change the schema your data would not conform to it anymore, therefore when you change it you wouldn't be able to apply it since exiting data would violate it.

i created the client attribute table just to simplify schema creation. if you know how to create exacty schema then you don't need the table. after you create the schema you can drop the table.

j
2009-11-28
re: Entity-Attribute-Value (EAV) model with SQL Server 2005 xml datatype
If you can't change the schema, how is this an eav alternative? You might as well stick to relational?

robertoteles
2010-02-05
re: Entity-Attribute-Value (EAV) model with SQL Server 2005 xml datatype
I think, setting the following XML may be an alternative to EAV:
<ClientAttributes>
<Attribute>
<Name>Age</Name>
<Type>Integer</Type>
<Value>1</Value>
</Attribute>
<Attribute>
<Name>Weight</Name>
<Type>Float</Type>
<Value>1.5</Value>
</Attribute>
<Attribute>
<Name>HasInsurance</Name>
<Type>Boolean</Type>
<Value>true</Value>
</Attribute>
</ClientAttributes>


And not the rigid XML presented Here, so you will never need to change the Schema

iAstute
2010-06-05
re: Entity-Attribute-Value (EAV) model with SQL Server 2005 xml datatype
Let say I have categories and I want custom fields specific for each category. I would create a fields table shown as below

CREATE TABLE [dbo].[Fields](
[Id] [int] IDENTITY(1,1) NOT NULL,
[CategoryId] [int] NULL,
[DateTypeId] [int] NULL,
[FieldTypeId] [int] NULL,
[Name] [nvarchar](50) NULL,
[Label] [nvarchar](50) NULL,
[Options] [nvarchar](max) NULL,
[Used] [bit] NOT NULL,
[Active] [bit] NOT NULL,
[ValidationMessage] [nvarchar](255) NULL,
[Format] [nvarchar](50) NULL,
[MaxLength] [smallint] NULL,
[DefaultValue] [nvarchar](50) NULL,
CONSTRAINT [PK_Fields] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO


CREATE TABLE [dbo].[Categories](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NULL,
[Description] [nvarchar](500) NULL,

CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO


CREATE TABLE [dbo].[Details](
[Id] [int] IDENTITY(1,1) NOT NULL,
[CategoryId] [int] NULL,
[Name] [nvarchar](255) NULL,
[AdditionalData] [xml] NULL,
CONSTRAINT [PK_Listings] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO


In AdditionalData column I would have
<fieldId>1</fieldId>
<value>test</value>
<fieldId>2</fieldId>
<value>06/05/2010</value>
<fieldId>3</fieldId>
<value>True</value>
...