I recently had a challenge given to me to improve an applications performance. I started digging around and found tables that looked like this:
CountryID | StateID | ZIPID | PersonID
1 | 3 | 2 | 7
So these would all be foreign key’d to their respective tables, The Country id of 1 would refer to the country in the Country table with an id of 1, etc. Nothing seems too wrong here.
ApplicationIdentifier | PersonIdentifier | CustomContentDate | CustomContentString | CustomContentBit
1 | 1~3~2~7 | null | Some Content | null
This table was my winner. Note the PersonIdentifier column, it needs to be foreign key’d to Table1 in order to get some specific data but the only way to do so is string manipulation. The 1~3~2~7 maps to the same person as my example in Table1 uses. So they have views around that look like this:
Select Left(PersonIdentifier,1) as CountryID, Mid(PersonIdentifier,3,1) as StateID, Mid(PersonIdentifier,5,1) as ZIPID, Mid(PersonIdentifier,7,1) as PersonID, etc
and then they would join those views to Table1, and other tables. Then those views would have more views joining them together, I have a stack of approx 20 queries in front of me that are joined to form one dataset. The way this is put together is so brittle, the entire thing will fail if they so much as increase the length of one of their IDs, and so much for adding indexes to increase performance, the string manipulation required to make joins makes that impossible.
That is not the last of their normalization offenses, but that’s the one I’m going to be talking about here. So you’re in my situation and you need to speed up the queries, how do you do it without breaking all of their other applications? Well I opted to create some SQL functions. I accepted there was nothing I could do to avoid the string parsing, but I could make it better. So I created a function to handle each of the different columns in the same database as the table lived, like this:
create function GetCountryFromPersonIdentifier(@PersonIdentifier varchar(128)) –Used for CountryID
returns varchar(30) as
declare @firstPos as int
declare @lastPos as int
set @firstPos = master.dbo.InString(@PersonIdentifier, ‘~’, 1)
set @lastPos = master.dbo.InString(@PersonIdentifier, ‘~’, 2) – master.dbo.InString(@PersonIdentifier, ‘~’, 1)
return Substring(@PersonIdentifier, @firstPos+1, @lastPos-@firstPos+1)
Note: this uses the InString() function I have documented in another article on this site.
Then I modified the existing table and added a column like this:
alter TABLE [dbo].[Table2] add
[CountryID] AS ([dbo].[GetCountryFromPersonIdentifier]([PersonIdentifier]))
After I was all done I had the 4 columns necessary to properly join Table2 to Table1 like so:
Select * from Table1
inner join Table2 on
Table1.CountryID = Table2.CountryID and Table1.StateID = Table2.StateID and Table1.ZIPID = Table2.ZIPID and Table1.PersonID = Table2.PersonID
So from now on no one has to write the dreaded left() and mid() statements again, as well since I determined where the ~ were in the string dynamically they could now increase the length of their unique identifiers and not break the system (well, at least not break my systems). Unfortunately I wasn’t able to add indexes to my new fields. That is supposed to be available but it appears to me that the version of SQL server they are running doesn’t support it. However I could now go into the cascading sets of queries they had and make just one nearly one-pass SQL statement.
Perhaps I’ll add more on this system and it’s challenges in the future, like for example how they managed to get Table2 to contain both the field definitions and the data for their different components.