I encountered the following problem on a server with a customer. They were using a view to query data and when they called the view they added a where clause,
SELECT * FROM dbo.v_MyView WHERE Timestamp > DATEADD(DAY,-2,GETDATE())
but even with the where clause, all rows in the tables were read.
Views are metadata and they are expanded into the query each time they are queried, that makes the view available for the query optimization process, when you add a where clause to your view the optimizer creates a good plan for you, where the underlying tables are filtered early, but that didn’t happen in this view.
After looking at the view definition I discovered the view had a function returning the highest date taking 2 inputs in the select part of the query.
The function looked like this:
CREATE FUNCTION udf_MaxDate (@Date1 DATETIME2, @Date2 DATETIME2) RETURNS DATETIME2 AS BEGIN RETURN (SELECT MAX(LastUpdateDate) FROM (VALUES (@Date1),(@Date2)) AS UpdateDate(LastUpdateDate)) END
And the view was similar to this:
CREATE VIEW dbo.Production_Function AS( SELECT P.ProductID, ActualCost, dbo.udf_MaxDate(TH.ModifiedDate, P.ModifiedDate) AS MaxDate, ProductNumber, P.Name FROM Production.TransactionHistory TH INNER JOIN Production.Product P ON p.ProductID = TH.ProductID)
When the view was called the MaxDate was used at the predicate column:
SELECT * FROM dbo.Production_Function WHERE MaxDate > '2008-09-02 00:00:00.000'
And the Query optimizer returned this plan:
And these stats:
Note that in the execution plan, the filtering happens after the compute scalar which is run on all rows in the result set in order to find the highest date of the 2 inputs.
Tooltips on the Compute scalar looked like this:
Looking at the defined values of the compute scalar we can see that here the function is called:
If we make a small change to our view to avoid this late filtering, we can remove the function and use a CASE instead
CREATE VIEW dbo.Production_NoFunction AS ( SELECT P.ProductID, ActualCost, CASE WHEN TH.ModifiedDate > P.ModifiedDate THEN TH.ModifiedDate WHEN TH.ModifiedDate < P.ModifiedDate THEN p.ModifiedDate ELSE NULL END AS MaxDate, ProductNumber, P.Name FROM Production.TransactionHistory TH INNER JOIN Production.Product P ON p.ProductID = TH.ProductID
And then query view like this:
SELECT * FROM dbo.Production_NoFunction WHERE MaxDate > '2008-09-02 00:00:00.000'
The Query optimizer returns this plan:
And the stats look like this:
The difference in CPU time Is 2.3 sec with the function and 0.106 sec with the case, that’s quite a difference on this small dataset.
Functions look harmless and are sometimes used as an easy fix in the query, but in fact, they can cause a bad performing Query, and when you use functions they can make it imposable to the optimizer to know how much is returned..
Let’s try and run these 2 queries in the same batch, one with function and one with the case statement
(Query 1) SELECT * FROM dbo.Production_NoFunction WHERE MaxDate > '2008-09-02 00:00:00.000' (Query2) SELECT * FROM dbo.Production_Function WHERE MaxDate > '2008-09-02 00:00:00.000'
The optimizer tells us that the view with the function has the lowest cost
But if we look at the execution plan XML in Query 2 we find one problem the optimizer didn’t tell us about... Implicit conversion
<ComputeScalar> ... <ColumnReference Column="Expr1005" /> <ScalarOperator ScalarString="CONVERT_IMPLICIT(datetime2(7),[AdventureWorks2008R2].[Production].[Product].[ModifiedDate] as [P].[ModifiedDate],0)"> <Convert DataType="datetime2" Scale="7" Style="0" Implicit="true"> ...
When the optimizer creates the expression to be used in the function later on in the plan we have an implicit_conversion, that’s because our function takes datetime2 and our column data is DateTime.
Let’s fix that and run the Query again.
ALTER FUNCTION udf_MaxDate (@Date1 DATETIME, @Date2 DATETIME) RETURNS DATETIME AS BEGIN RETURN (SELECT MAX(LastUpdateDate) FROM (VALUES (@Date1),(@Date2)) AS UpdateDate(LastUpdateDate)) END SELECT * FROM dbo.Production_Function WHERE MaxDate > '2008-09-02 00:00:00.000'
Now the compute scalars after selecting the tables are gone, but we still have the late filtering that’s because the view must have all the data prepared before it can make the filtering process.
Sometimes the plan doesn't tell the whole truth, and you can benefit from spending time looking at the execution plan, and sometimes it's necessary to dig deeper and look at the execution plan as XML.
Thanks for reading. If you have any questions, troubles with your SQL server, queries, or environment, please don’t hesitate to call me or one of my colleagues at Unit IT phone: 88 333 333