I want some Moore

Blog about stuff and things and stuff. Mostly about SQL server and .Net
posts - 219, comments - 2287, trackbacks - 33

My Links

Advertisement

News

Hi! My name is 
Mladen Prajdić  I'm from Slovenia and I'm currently working as a .Net (C#) and SQL Server developer.

I also speak at local user group meetings and conferences like SQLBits and NT Conference
Welcome to my blog.
SQL Server MVP

My Books

SQL Server MVP Deep Dives 2
The Red Gate Guide to SQL Server Team based Development Free e-book

My Blog Feed via Email
Follow MladenPrajdic on Twitter


Users Online: who's online

Article Categories

Archives

Post Categories

Cool software

Other Blogs

Other stuff

SQL stuff

Comparing SQL Server HASHBYTES function and .Net hashing

A while back we had an interesting problem at work. We were calculating MD5 hashes for some values in both .Net and SQL Server and although the input values were all the same our MD5 hashes were different. After some time spent looking dumbfounded at the code I’ve realized what the bug was and started laughing. I saw that HashBytes function was the “culprit”. HashBytes function was introduced in SQL server 2005 to simplify creating hashes in the database. It can convert values to MD2, MD4, MD5, SHA, or SHA1 formats. HashBytes' result depends on the input text’s data type and can at the first glance give a bit different results that we might think, especially when combined with the .Net framework. Of course hashing itself works correctly in both. Let’s see what going on with some examples. From here on when i say hash i mean MD5 hash.

Example in SQL Server

-- note the size 15 of all datatypes
declare @val1 varchar(15),
        @val2 nvarchar(15),
        @val3 char(15),
        @val4 nchar(15)

-- example of 1 byte/char text
-- all variables are of same length
select  @val1 =  '1234567890',
        @val2 = N'1234567890',
        @val3 =  '1234567890',
        @val4 = N'1234567890'

-- all 4 return different results
select  HASHBYTES('md5', @val1) as MD5_varchar,     -- result = 0xE807F1FCF82D132F9BB018CA6738A19F
        -- just to show that collation doesn't change the hash
        HASHBYTES('md5', @val1 collate Cyrillic_General_BIN2) as MD5_varchar_collation, -- result = 0xE807F1FCF82D132F9BB018CA6738A19F
        HASHBYTES('md5', @val2) as MD5_Nvarchar,    -- result = 0xE15E31C3D8898C92AB172A4311BE9E84
        HASHBYTES('md5', @val3) as MD5_char,        -- result = 0x2120C3F3423F89BA8A65ABD933321884
        HASHBYTES('md5', @val4) as MD5_Nchar        -- result = 0x90DEF5840F3A31174CA44E2022F743B6

-- example of 2 bytes/char text
-- 中文 means Chinese in written text
-- converting 2 bytes/char text to varchar cuts the text in half
select  @val1 =  '中文',   -- this cuts the text in half
        @val2 = N'中文',   -- this stores the whole text
        @val3 =  '中文',   -- this cuts the text in half
        @val4 = N'中文'    -- this stores the whole text

-- all 4 return different results
select  HASHBYTES('md5', @val1) as MD5_varchar,     -- result = 0xEA03FCB8C47822BCE772CF6C07D0EBBB
        HASHBYTES('md5', @val2) as MD5_Nvarchar,    -- result = 0x73C6C8CD2F94355EF015E5265D5E65B1
        HASHBYTES('md5', @val3) as MD5_char,        -- result = 0xA13C45A38853677887B4839071537634
        HASHBYTES('md5', @val4) as MD5_Nchar        -- result = 0xEADEBD3BD72A481C43C828E0C550145C
 
The catch here is the data type difference. Nvarchar and nchar take twice more bytes to store data than varchar and char. Since char and nchar pad the right side of the string with spaces to fill the gap up to defined data type length the spaces are also used in MD5 calculation. A collation plays no part in calculating hash values since collations are only applied to sorts and comparisons.

However care must be taken with texts that need 2 bytes per char of storage space like Chinese text. Storing it in a varchar variable cuts it in half.

Example in .Net - C#

For generating MD5 hashes I’ve used the code from this site which turned up as the first result on Google for “MD5 in C#”:

public string GetMD5Hash(string input)
{
    System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
    //byte[] bs = System.Text.Encoding.ASCII.GetBytes(input);
    //byte[] bs = System.Text.Encoding.UTF7.GetBytes(input);
    //byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
    byte[] bs = System.Text.Encoding.Unicode.GetBytes(input);
    //byte[] bs = System.Text.Encoding.UTF32.GetBytes(input);
    bs = x.ComputeHash(bs);
    System.Text.StringBuilder s = new System.Text.StringBuilder();
    foreach (byte b in bs)
    {
        s.Append(b.ToString("x2").ToLower());
    }
    string password = s.ToString();
    return password;
}

// hashes for the simple “1234567890” text in all encodings
ASCII:   e807f1fcf82d132f9bb018ca6738a19f
UTF7:    e807f1fcf82d132f9bb018ca6738a19f
UTF8:    e807f1fcf82d132f9bb018ca6738a19f
Unicode: e15e31c3d8898c92ab172a4311be9e84
UTF32:   6a57502c29a5081f03cb70e0ad38ecc7

// hashes for the complex “中文” text in all encodings
ASCII:   ea03fcb8c47822bce772cf6c07d0ebbb
UTF7:    eb02105e5c51a33f21e8da7f8102cfda
UTF8:    a7bac2239fcdcb3a067903d8077c4a07
Unicode: 73c6c8cd2f94355ef015e5265d5e65b1
UTF32:   65fe91b81ed1107566f9f9f5ed4ccaf1 

 

All strings in .Net store their chars in 2 bytes by default. When hashing values we have to take this into account and use proper text Encoding. .Net supports 5 encodings: ASCII (7 bits per char), UTF7 (7 bits per char), UTF8 (8 bits = 1 byte per char), Unicode (UTF-16) (16 bits = 2 bytes per char) and UTF32 (32 bits = 4 bytes per char).

For text with only first 127 chars in the ASCII table ASCII, UTF7 and UTF8 encodings all return the same hash, but with UTF16 and UTF32 comes endianness so they don’t return the same hash values. Also note that the ASCII and UTF7 encoding aren’t recommended to be used anymore except in legacy apps. UTF8 should be used instead.

For text with complex chars we have to use Unicode or UTF32 or we loose char information.

Putting .Net and SQL Server together

 

It turns out that such a simple thing as hashing can become a serious issue if we’re not careful. Trouble always awaits when dealing with text and encodings. :)

When using SQL Server’s varchar data type the .Net encoding to go with is UTF8 since it’s the fastest and most optimized of the three (ASCII, UTF7, UTF8). When using the nvarchar data type to go is Unicode (UTF16) but we also have to know the texts endianness to create correct hashes. UTF32 is practically useless in this case because SQL Server doesn’t have a data type that stores text in 4 bytes/char so we’ll never get the same results if we use it.

This advice only applies when creating hashes both in .Net and SQL server and comparing them. If we’re creating hashes in .Net and only store them in a database then we don’t have to worry about this.

 

kick it on DotNetKicks.com

Print | posted on Tuesday, April 28, 2009 11:43 AM | Filed Under [ .Net SQL Server ]

Feedback

Gravatar

# re: Comparing SQL Server HASHBYTES function and .Net hashing

I'm glad you ran in to this. I found the same thing:

This:
select substring(master.dbo.fn_varbintohexstr(hashbytes('MD5', CONVERT(varbinary, 'test@test.com'))), 3, 32)

is different from this:

select substring(master.dbo.fn_varbintohexstr(hashbytes('MD5', CONVERT(varbinary, email))), 3, 32)

...when email is an nvarchar field.

My Solution: create a varchar (not nvarchar) field and set it to email. And then, convert.
8/26/2009 5:35 PM | Gecko Coder
Gravatar

# re: Comparing SQL Server HASHBYTES function and .Net hashing

This doesn't with something like this:

Αυθεντικόκείμενο

11/4/2009 5:44 PM | Codematrix
Gravatar

# re: Comparing SQL Server HASHBYTES function and .Net hashing

This blog entry was referenced in a German developer forum:
http://www.mycsharp.de/wbb2/thread.php?threadid=80354

Greets
Flo
1/13/2010 1:37 PM | Florian Reischl
Gravatar

# re: Comparing SQL Server HASHBYTES function and .Net hashing

Thanks for this. I've been pulling my hair out trying to figure out.

Here's the code for anyone who wants it:

string password = "test";
SHA1 hash = new SHA1CryptoServiceProvider();

hash.ComputeHash(Encoding.ASCII.GetBytes(password));
//at this point hash.hash will be the same as HashBytes('sha1', cast('test' as varchar(30)))

//OR

hash.ComputeHash(Encoding.Unicode.GetBytes(password));
//at this point hash.hash will be the same as HashBytes('sha1', cast('test' as Nvarchar(30)))
3/19/2010 12:36 AM | Eric North
Comments have been closed on this topic.

Powered by:
Powered By Subtext Powered By ASP.NET