我正在尝试编写一个 SQL 查询,它将查找在两个特定列中具有重复值的行,然后将重复的值更新为新的连续值。

例如,如果给定的表预先具有以下结果:

ID 顺序 数据1 数据2
1 1
1 1
2 1
2 2
3 1
3 1
3 1

应更新为:

ID 顺序 数据1 数据2
1 1
1 2
2 1
2 2
3 1
3 2
3 3

这是我为尝试实现此目的而编写的脚本,但它似乎只是将所有具有重复的 id 行的序列值更改为 1。

WITH DuplicateRecords AS (
   SELECT id, 
          sequence,
          ROW_NUMBER() OVER (PARTITION BY id, sequence ORDER BY (SELECT NULL)) AS RowNum
   FROM table1
   WHERE Id IN (
      SELECT id
      FROM table1
      GROUP BY id, sequence
      HAVING COUNT(*) > 1
   )
),
UPDATE table1
SET sequence = DR.RowNum
FROM table1
JOIN DuplicateRecords DR ON table1.id = DR.id
WHERE table1.id IN (
   SELECT id
   FROM table1
   GROUP BY id, sequence
   HAVING COUNT(*) > 1
)

3

  • 请尝试从 ROW_NUMBER() 中删除序列


    – 

  • 您忘记填写预期值和实际值


    – 

  • 您如何知道哪一行获得哪个序列?是否有时间戳列或其他可以指示自然顺序的内容?


    – 


最佳答案
2

只需更新 CTE…

CREATE TABLE table1 (
    id INT,
    sequence INT,
    data1 VARCHAR(100), 
    data2 VARCHAR(100)  
);

INSERT INTO table1 (id, sequence, data1, data2)
VALUES
(1, 1, 'data1_value1', 'data2_value1'),
(1, 1, 'data1_value2', 'data2_value2'),
(2, 1, 'data1_value3', 'data2_value3'),
(2, 2, 'data1_value4', 'data2_value4'),
(3, 1, 'data1_value5', 'data2_value5'),
(3, 1, 'data1_value6', 'data2_value6'),
(3, 1, 'data1_value7', 'data2_value7');

7 rows affected
WITH DuplicateRecords AS (
    SELECT id, 
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY id, sequence, data1, data2) AS NewSequence, 
           sequence, 
           data1, 
           data2
    FROM table1
    WHERE id IN (
        SELECT id
        FROM table1
        GROUP BY id
        HAVING MIN(sequence) <> 1
            OR COUNT(DISTINCT sequence) <> COUNT(*)
            OR MAX(sequence) <> COUNT(*)
    )
)
UPDATE DuplicateRecords 
SET sequence = NewSequence
5 rows affected
SELECT * FROM table1; 
ID 顺序 数据1 数据2
1 1 数据1_值1 数据2_值1
1 2 数据1_值2 数据2_值2
2 1 数据1_值3 数据2_值3
2 2 数据1_值4 数据2_值4
3 1 数据1_值5 数据2_值5
3 2 数据1_值6 数据2_值6
3 3 数据1_值7 数据2_值7

编辑:我试图对我的答案投反对票,但不允许。如果@MartinSmith 添加以下内容作为答案,则会删除…

WITH DuplicateRecords AS (
    SELECT id, 
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY id, sequence, data1, data2) AS NewSequence, 
           sequence, 
           data1, 
           data2
    FROM table1
)
UPDATE DuplicateRecords 
SET sequence = NewSequence
WHERE sequence <> NewSequence

1

  • 1
    @MartinSmith 因为我太专注于修复 Op 的 CTE 中的错误而只见树木不见森林……


    – 

您不仅需要匹配 id 和序列,还需要匹配 data1 和 data2,以确保更新正确的重复行,这样可以防止对同一行进行多次更新。序列需要使用同一 id 内每个重复集的序号进行更新。

CREATE TABLE table1 (
    id INT,
    sequence INT,
    data1 VARCHAR(100), 
    data2 VARCHAR(100)  
);

INSERT INTO table1 (id, sequence, data1, data2)
VALUES
(1, 1, 'data1_value1', 'data2_value1'),
(1, 1, 'data1_value2', 'data2_value2'),
(2, 1, 'data1_value3', 'data2_value3'),
(2, 2, 'data1_value4', 'data2_value4'),
(3, 1, 'data1_value5', 'data2_value5'),
(3, 1, 'data1_value6', 'data2_value6'),
(3, 1, 'data1_value7', 'data2_value7');

WITH DuplicateRecords AS (
    SELECT id, 
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS NewSequence, 
           sequence, 
           data1, 
           data2
    FROM table1
    WHERE id IN (
        SELECT id
        FROM table1
        GROUP BY id, sequence
        HAVING COUNT(*) > 1
    )
)
UPDATE t
SET t.sequence = DR.NewSequence
FROM table1 t
INNER JOIN DuplicateRecords DR
    ON t.id = DR.id 
    AND t.sequence = DR.sequence
    AND t.data1 = DR.data1
    AND t.data2 = DR.data2;


SELECT * FROM table1;
ID 顺序 数据1 数据2
1 1 数据1_值1 数据2_值1
1 2 数据1_值2 数据2_值2
2 1 数据1_值3 数据2_值3
2 2 数据1_值4 数据2_值4
3 1 数据1_值5 数据2_值5
3 2 数据1_值6 数据2_值6
3 3 数据1_值7 数据2_值7

3

  • 假设它(id, sequence, data1, data2)是唯一的。


    – 


  • 3
    相反,只需使用可更新的 CTE;


    – 


  • 您应该将其作为答案发布,@MatBailie。


    –