我正在尝试编写一个 SQL 查询,它将查找在两个特定列中具有重复值的行,然后将重复的值更新为新的连续值。
例如,如果给定的表预先具有以下结果:
ID | 顺序 | 数据1 | 数据2 |
---|---|---|---|
1 | 1 | … | |
1 | 1 | ||
2 | 1 | ||
2 | 2 | ||
3 | 1 | ||
3 | 1 | ||
3 | 1 |
应更新为:
ID | 顺序 | 数据1 | 数据2 |
---|---|---|---|
1 | 1 | … | |
1 | 2 | ||
2 | 1 | ||
2 | 2 | ||
3 | 1 | ||
3 | 2 | ||
3 | 3 |
这是我为尝试实现此目的而编写的脚本,但它似乎只是将所有具有重复的 id 行的序列值更改为 1。
WITH DuplicateRecords AS (
SELECT id,
sequence,
ROW_NUMBER() OVER (PARTITION BY id, sequence ORDER BY (SELECT NULL)) AS RowNum
FROM table1
WHERE Id IN (
SELECT id
FROM table1
GROUP BY id, sequence
HAVING COUNT(*) > 1
)
),
UPDATE table1
SET sequence = DR.RowNum
FROM table1
JOIN DuplicateRecords DR ON table1.id = DR.id
WHERE table1.id IN (
SELECT id
FROM table1
GROUP BY id, sequence
HAVING COUNT(*) > 1
)
3
最佳答案
2
只需更新 CTE…
CREATE TABLE table1 (
id INT,
sequence INT,
data1 VARCHAR(100),
data2 VARCHAR(100)
);
INSERT INTO table1 (id, sequence, data1, data2)
VALUES
(1, 1, 'data1_value1', 'data2_value1'),
(1, 1, 'data1_value2', 'data2_value2'),
(2, 1, 'data1_value3', 'data2_value3'),
(2, 2, 'data1_value4', 'data2_value4'),
(3, 1, 'data1_value5', 'data2_value5'),
(3, 1, 'data1_value6', 'data2_value6'),
(3, 1, 'data1_value7', 'data2_value7');
7 rows affected
WITH DuplicateRecords AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id, sequence, data1, data2) AS NewSequence,
sequence,
data1,
data2
FROM table1
WHERE id IN (
SELECT id
FROM table1
GROUP BY id
HAVING MIN(sequence) <> 1
OR COUNT(DISTINCT sequence) <> COUNT(*)
OR MAX(sequence) <> COUNT(*)
)
)
UPDATE DuplicateRecords
SET sequence = NewSequence
5 rows affected
SELECT * FROM table1;
ID | 顺序 | 数据1 | 数据2 |
---|---|---|---|
1 | 1 | 数据1_值1 | 数据2_值1 |
1 | 2 | 数据1_值2 | 数据2_值2 |
2 | 1 | 数据1_值3 | 数据2_值3 |
2 | 2 | 数据1_值4 | 数据2_值4 |
3 | 1 | 数据1_值5 | 数据2_值5 |
3 | 2 | 数据1_值6 | 数据2_值6 |
3 | 3 | 数据1_值7 | 数据2_值7 |
编辑:我试图对我的答案投反对票,但不允许。如果@MartinSmith 添加以下内容作为答案,则会删除…
WITH DuplicateRecords AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id, sequence, data1, data2) AS NewSequence,
sequence,
data1,
data2
FROM table1
)
UPDATE DuplicateRecords
SET sequence = NewSequence
WHERE sequence <> NewSequence
1
-
1@MartinSmith 因为我太专注于修复 Op 的 CTE 中的错误而只见树木不见森林……
–
|
您不仅需要匹配 id 和序列,还需要匹配 data1 和 data2,以确保更新正确的重复行,这样可以防止对同一行进行多次更新。序列需要使用同一 id 内每个重复集的序号进行更新。
CREATE TABLE table1 (
id INT,
sequence INT,
data1 VARCHAR(100),
data2 VARCHAR(100)
);
INSERT INTO table1 (id, sequence, data1, data2)
VALUES
(1, 1, 'data1_value1', 'data2_value1'),
(1, 1, 'data1_value2', 'data2_value2'),
(2, 1, 'data1_value3', 'data2_value3'),
(2, 2, 'data1_value4', 'data2_value4'),
(3, 1, 'data1_value5', 'data2_value5'),
(3, 1, 'data1_value6', 'data2_value6'),
(3, 1, 'data1_value7', 'data2_value7');
WITH DuplicateRecords AS (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS NewSequence,
sequence,
data1,
data2
FROM table1
WHERE id IN (
SELECT id
FROM table1
GROUP BY id, sequence
HAVING COUNT(*) > 1
)
)
UPDATE t
SET t.sequence = DR.NewSequence
FROM table1 t
INNER JOIN DuplicateRecords DR
ON t.id = DR.id
AND t.sequence = DR.sequence
AND t.data1 = DR.data1
AND t.data2 = DR.data2;
SELECT * FROM table1;
ID | 顺序 | 数据1 | 数据2 |
---|---|---|---|
1 | 1 | 数据1_值1 | 数据2_值1 |
1 | 2 | 数据1_值2 | 数据2_值2 |
2 | 1 | 数据1_值3 | 数据2_值3 |
2 | 2 | 数据1_值4 | 数据2_值4 |
3 | 1 | 数据1_值5 | 数据2_值5 |
3 | 2 | 数据1_值6 | 数据2_值6 |
3 | 3 | 数据1_值7 | 数据2_值7 |
3
-
假设它
(id, sequence, data1, data2)
是唯一的。
–
-
3相反,只需使用可更新的 CTE;
–
-
您应该将其作为答案发布,@MatBailie。
–
|
–
–
–
♦
|