Peter Larsson Blog

Patron Saint of Lost Yaks

Curiosity found

For a few days I have been wondering about some annoying INSERT statement. I am aware of heaps and clustered indexes but I still can't get the last pieces together.

This is the original code

CREATE TABLE  #Sample
              (
                     RowID TINYINT IDENTITY(1, 1),
                     z TINYINT,
                     A TINYINT,
                     B TINYINT
              )

CREATE UNIQUE NONCLUSTERED INDEX IX_A ON #Sample (A) WITH (IGNORE_DUP_KEY = ON)
CREATE UNIQUE NONCLUSTERED INDEX IX_B ON #Sample (B) WITH (IGNORE_DUP_KEY = ON)

INSERT        #Sample
              (
                     z,
                     A,
                     B
              )

SELECT        TOP 100 PERCENT
              c.z,
              c.A
            c.B
FROM
         (
                     SELECT       TOP 100 PERCENT
                                  r.z,
                                  r.A,
                                  r.B
                     FROM         (
                                         SELECT 1 AS z, 3 AS A, 1 AS B UNION ALL
                                         SELECT 2 AS z, 2 AS A, 1 AS B UNION ALL
                                         SELECT 3 AS z, 2 AS A, 3 AS B UNION ALL
                                         SELECT 4 AS z, 1 AS A, 2 AS B
                                  ) AS r
                     ORDER BY     r.z
              ) AS c
ORDER
BY      c.z

SELECT
        RowID,
              z,
              A,
              B
FROM
          #Sample
ORDER BY      z

With this code, I got the following output. Please notice the correlation between RowID and z.

RowID  z      A      B
2      2      2      1
4      4      1      2

Where did RowID 1 go?

I then were suggested by Nathan and TG to reverse the order of index creation, and so I did.
This is the revised code with reverse index creation.

CREATE TABLE  #Sample
              (
                     RowID TINYINT IDENTITY(1, 1),
                     z TINYINT,
                     A TINYINT,
                     B TINYINT
              )

CREATE UNIQUE NONCLUSTERED INDEX IX_B ON #Sample (B) WITH (IGNORE_DUP_KEY = ON)
CREATE UNIQUE NONCLUSTERED INDEX IX_A ON #Sample (A) WITH (IGNORE_DUP_KEY = ON)

INSERT        #Sample
              (
                     z,
                     A,
                     B
              )
SELECT        TOP 100 PERCENT
              c.z,
              c.A,
              c.B
FROM
          (
                     SELECT       TOP 100 PERCENT
                                  r.z,
                                  r.A,
                                  r.B
                     FROM         (
                                         SELECT 1 AS z, 3 AS A, 1 AS B UNION ALL
                                         SELECT 2 AS z, 2 AS A, 1 AS B UNION ALL
                                         SELECT 3 AS z, 2 AS A, 3 AS B UNION ALL
                                         SELECT 4 AS z, 1 AS A, 2 AS B
                                  ) AS r
                     ORDER BY      r.z
              ) AS c
ORDER
BY      c.z

SELECT
        RowID,
              z,
              A,
              B
FROM
          #Sample
ORDER BY      z

And now I get the expected result!

RowID z     A     B
1     1     3     1
3     3     2     3
4     4     1     2

Also here you can notice the correlation between RowID and z.

The mystery for me is why the original attemp doesn't work. I read Craig Freeman's excellent blog here but I didn't find the answer.

I did get an answer from Craig and here it is

The result depends on the order in which the rows are considered for insertion.  Since this example has two unique indexes, the server independently checks for and eliminates duplicates from both unique columns.  Since the target table starts out empty, the only duplicates it finds are those in the initial input set.  Recall that to find duplicates in the input set, the server sorts the rows and selects only the first row from each group of rows with the same key.

Suppose that after the sort on the first column, the input set has the following order:

(1 2) (1 3) (2 3) (2 1) (3 1) (3 2)

The server discards every other row leaving:

(1 2) (2 3) (3 1)

Now the server sorts on the second column:

(3 1) (1 2) (2 3)

And since there are no duplicates on the second column, it inserts all three rows.

Now suppose that the after the sort on the first column, the last two rows are reversed:

(1 2) (1 3) (2 3) (2 1) (3 2) (3 1)

This is still a valid sort order, but the server discards a different row leaving this set instead:

(1 2) (2 3) (3 2)

Again, the server sorts on the second column:

(1 2) (3 2) (2 3)

This time there is a duplicate on the second column, so the server discards it and inserts only these two rows:

(1 2) (2 3)

This is a very good explanation and it holds it promise when explaining both test cases.
And it also explains why the ORDER BY has no relevance with inserts.
It is the present indexes that defines order of insert, even if they are not clustered.

But it doesn't explain why RowID 1 is missing in original test case.

And is there a way to force the query engine to which index to check first?
Or do we have to remember which index that is added last?
Is there some kind of LIFO stack system for indexes?

 

Legacy Comments


Harsh Athalye
2008-02-13
re: Curiosity found
I think this might be caused by IGNORE_DUP_KEY hint.

I tried to run both examples on SQL Server 2000 SP4 and both gave me totally identical but completely different results from yours.

I get:

RowID z A B
4 1 3 1
1 4 1 2

along with "Duplicate Key Ignored" warning.

Harsh Athalye
2008-02-13
re: Curiosity found
I ran it on SQL 2005 with the same result as yours and messages tab showing the earlier warning which now I am sure is answer to the mysterious disppearance of 2 records from the last select.