Lesson learned – Real life performance tuning

Scroll

I encountered the following problem on a server with a customer. They were using a view to query data and when they called the view they added a where clause,

SELECT *
FROM dbo.v_MyView
WHERE Timestamp > DATEADD(DAY,-2,GETDATE())

but even with the where clause, all rows in the tables were read.

Views are metadata and they are expanded into the query each time they are queried, that makes the view available for the query optimization process, when you add a where clause to your view the optimizer creates a good plan for you, where the underlying tables are filtered early, but that didn’t happen in this view.

After looking at the view definition I discovered the view had a function returning the highest date taking 2 inputs in the select part of the query.

The function looked like this:

CREATE FUNCTION udf_MaxDate (@Date1 DATETIME2, @Date2 DATETIME2)
RETURNS DATETIME2
AS
BEGIN
    RETURN
    (SELECT MAX(LastUpdateDate)
      FROM (VALUES (@Date1),(@Date2)) AS UpdateDate(LastUpdateDate))  
END

And the view was similar to this:

CREATE VIEW dbo.Production_Function
AS(
SELECT P.ProductID, ActualCost,  
dbo.udf_MaxDate(TH.ModifiedDate, P.ModifiedDate) AS MaxDate, 
ProductNumber, 
P.Name
FROM Production.TransactionHistory TH INNER JOIN 
Production.Product P ON p.ProductID = TH.ProductID)

When the view was called the MaxDate was used at the predicate column:

SELECT * FROM dbo.Production_Function
WHERE MaxDate > '2008-09-02 00:00:00.000'

And the Query optimizer returned this plan:

And these stats:

Note that in the execution plan, the filtering happens after the compute scalar which is run on all rows in the result set in order to find the highest date of the 2 inputs.

Tooltips on the Compute scalar looked like this:

Looking at the defined values of the compute scalar we can see that here the function is called:

If we make a small change to our view to avoid this late filtering, we can remove the function and use a CASE instead

CREATE VIEW dbo.Production_NoFunction
AS
(
SELECT P.ProductID, 
    ActualCost,  
    CASE WHEN TH.ModifiedDate > P.ModifiedDate THEN TH.ModifiedDate
         WHEN TH.ModifiedDate < P.ModifiedDate THEN p.ModifiedDate
         ELSE NULL
         END AS MaxDate, 
    ProductNumber, 
    P.Name
FROM Production.TransactionHistory TH INNER JOIN 
Production.Product P ON p.ProductID = TH.ProductID

And then query view like this:

SELECT * FROM dbo.Production_NoFunction
WHERE MaxDate > '2008-09-02 00:00:00.000'

The Query optimizer returns this plan:

And the stats look like this:

The difference in CPU time Is 2.3 sec with the function and 0.106 sec with the case, that’s quite a difference on this small dataset.

Functions look harmless and are sometimes used as an easy fix in the query, but in fact, they can cause a bad performing Query, and when you use functions they can make it imposable to the optimizer to know how much is returned..

Let’s try and run these 2 queries in the same batch, one with function and one with the case statement

(Query 1)
SELECT * FROM dbo.Production_NoFunction 
WHERE MaxDate > '2008-09-02 00:00:00.000'
(Query2)
SELECT * FROM dbo.Production_Function
WHERE MaxDate > '2008-09-02 00:00:00.000'

The optimizer tells us that the view with the function has the lowest cost

But if we look at the execution plan XML in Query 2 we find one problem the optimizer didn’t tell us about... Implicit conversion

<ComputeScalar>
...
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="CONVERT_IMPLICIT(datetime2(7),[AdventureWorks2008R2].[Production].[Product].[ModifiedDate] as [P].[ModifiedDate],0)">
<Convert DataType="datetime2" Scale="7" Style="0" Implicit="true">
...

When the optimizer creates the expression to be used in the function later on in the plan we have an implicit_conversion, that’s because our function takes datetime2 and our column data is DateTime.

Let’s fix that and run the Query again.

ALTER FUNCTION udf_MaxDate (@Date1 DATETIME, @Date2 DATETIME)
RETURNS DATETIME
AS
BEGIN
    RETURN
    (SELECT MAX(LastUpdateDate)
      FROM (VALUES (@Date1),(@Date2)) AS UpdateDate(LastUpdateDate))  
END
SELECT * FROM dbo.Production_Function
WHERE MaxDate > '2008-09-02 00:00:00.000'

Now the compute scalars after selecting the tables are gone, but we still have the late filtering that’s because the view must have all the data prepared before it can make the filtering process.

Sometimes the plan doesn't tell the whole truth, and you can benefit from spending time looking at the execution plan, and sometimes it's necessary to dig deeper and look at the execution plan as XML.

Thanks for reading. If you have any questions, troubles with your SQL server, queries, or environment, please don’t hesitate to call me or one of my colleagues at Unit IT phone: 88 333 333

Ring til os