问题描述
我在 pgsql 上有一个带有名称的表(有超过 1 个 mio.行),但我也有很多重复项.我选择了 3 个字段:id
、name
、metadata
.
I have a table on pgsql with names (having more than 1 mio. rows), but I have also many duplicates. I select 3 fields: id
, name
, metadata
.
我想用 ORDER BY RANDOM()
和 LIMIT 1000
随机选择它们,所以我这样做是为了在我的 PHP 脚本中节省一些内存.
I want to select them randomly with ORDER BY RANDOM()
and LIMIT 1000
, so I do this is many steps to save some memory in my PHP script.
但是我怎么能这样做,它只会给我一个名称中没有重复的列表.
But how can I do that so it only gives me a list having no duplicates in names.
例如 [1,"Michael Fox","2003-03-03,34,M,4545"]
将返回,但不会返回 [2,"Michael Fox","1989-02-23,M,5633"]
.name 字段是最重要的,每次我选择时都必须在列表中唯一,并且必须是随机的.
For example [1,"Michael Fox","2003-03-03,34,M,4545"]
will be returned but not [2,"Michael Fox","1989-02-23,M,5633"]
. The name field is the most important and must be unique in the list everytime I do the select and it must be random.
我尝试使用 GROUP BY 名称
,然后它希望我在 GROUP BY
中或在聚合函数中也有 id 和元数据,但我不想要以某种方式过滤它们.
I tried with GROUP BY name
, bu then it expects me to have id and metadata in the GROUP BY
as well or in a aggragate function, but I dont want to have them somehow filtered.
有人知道如何获取多列但只对一列执行不同的操作吗?
Anyone knows how to fetch many columns but do only a distinct on one column?
推荐答案
仅对一(或 n)列进行不同处理:
To do a distinct on only one (or n) column(s):
select distinct on (name)
name, col1, col2
from names
这将返回包含名称的任何行.如果您想控制将返回哪些行,您需要订购:
This will return any of the rows containing the name. If you want to control which of the rows will be returned you need to order:
select distinct on (name)
name, col1, col2
from names
order by name, col1
按 col1 排序时返回第一行.
Will return the first row when ordered by col1.
区别于
:
SELECT DISTINCT ON ( expression [, ...] ) 仅保留给定表达式计算结果相等的每组行的第一行.DISTINCT ON 表达式使用与 ORDER BY 相同的规则解释(见上文).请注意,每个集合的第一行"是不可预测的,除非使用 ORDER BY 来确保所需的行首先出现.
DISTINCT ON 表达式必须匹配最左边的 ORDER BY 表达式.ORDER BY 子句通常包含额外的表达式,用于确定每个 DISTINCT ON 组中行的所需优先级.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
这篇关于Postgres:不同但仅用于一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!