在具有超过10万行的表中,如何有效地对特定列的值进行混洗?
表定义:
CREATE TABLE person
(
id integer NOT NULL,
first_name character varying,
last_name character varying,
CONSTRAINT person_pkey PRIMARY KEY (id)
)
为了匿名化数据,我必须将 'first_name' 列 的值改组到 (我不允许创建新表)。
我的尝试:
with
first_names as (
select row_number() over (order by random()),
first_name as new_first_name
from person
),
ids as (
select row_number() over (order by random()),
id as ref_id
from person
)
update person
set first_name = new_first_name
from first_names, ids
where id = ref_id;
需要几个小时才能完成。
有有效的方法吗?
最佳答案
postgres 的问题是每次更新都意味着 delete
+ insert
SELECT
而不是 UPDATE
来检查分析以查看 CTE .
CREATE TABLE new_table AS
SELECT * ....
DROP oldtable;
Rename new_table to old_table
CREATE index and constrains
抱歉,这不是您的选择:(
编辑: 阅读
a_horse_with_no_name
后看起来你需要
with
first_names as (
select row_number() over (order by random()) rn,
first_name as new_first_name
from person
),
ids as (
select row_number() over (order by random()) rn,
id as ref_id
from person
)
update person
set first_name = new_first_name
from first_names
join ids
on first_names.rn = ids.rn
where id = ref_id;
同样,如果您提供
ANALYZE / EXPLAIN
结果,则对性能的问题会更好。关于sql - PostgreSQL shuffle 列值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/33555524/