Monday, November 16, 2009

How to remove Duplicate Records using SQL Server 2005

By using a Common Table Expression (CTE) in SQL Server 2005 we can easily remove duplicate records from a table, even when the table without a primary key.

The windowing function named DENSE_RANK is used to group the records together based on the Symbol, MarketType,Trading_Date , Trading_Day,[Open],High,Low,[Close] and Volume fields, and assign them with a value randomly. This means that if I have two records with the exact same Symbol, MarketType,Trading_Date , Trading_Day,[Open],High,Low,[Close] and Volume values, the first record will be ranked as 1, the second as 2, and so on.

;WITH DupliRows(Symbol, MarketType,Trading_Date , Trading_Day,[Open],High,Low,[Close],Volume, Ranking)

AS

(

SELECT

Symbol, MarketType,Trading_Date , Trading_Day,[Open],High,Low,[Close],Volume,

Ranking = DENSE_RANK() OVER(PARTITION BY Symbol, MarketType,Trading_Date , Trading_Day,[Open],High,Low,[Close],Volume ORDER BY NEWID() ASC)

FROM dbo.tbl_Futures ---The Table from where duplicate records will be remove
)

DELETE FROM DupliRows
--Select * from DupliRows
WHERE Ranking >1


Because a CTE acts as a virtual table, We able to process data modification statements against it, and the underlying table will be affected. In this case, We can remove any record from the DupliRows that is ranked higher than 1. This will remove all duplicate records.

No comments:

Post a Comment