问题描述
以下是我的DataFrame:
I have this DataFrame bellow:
Ref ° | Indice_1 | Indice_2 | 1 | 2 | indice_from | indice_from | indice_to | indice_to
---------------------------------------------------------------------------------------------------------------------------------------------
1 | 19 | 37.1 | 32 | 62 | ["20031,10031"] | ["13,11/12"] | ["40062,30062"] | ["14A,14"]
---------------------------------------------------------------------------------------------------------------------------------------------
2 | 19 | 37.1 | 44 | 12 | ["40062,30062"] | ["13,11/12"] | ["40062,30062"] | ["14A,14"]
---------------------------------------------------------------------------------------------------------------------------------------------
3 | 19 | 37.1 | 22 | 64 | ["20031,10031"] | ["13,11/12"] | ["20031,10031"] | ["13,11/12"]
---------------------------------------------------------------------------------------------------------------------------------------------
4 | 19 | 37.1 | 32 | 98 | ["20032,10032"] | ["13,11/12"] | ["40062,30062"] | ["13,11/12"]
我想按升序对indice_from,indice_from,indice_to和indice_to列的值进行排序,并且我不应该触摸DataFrame的其余列.知道indice_from和indice_to的2列有时包含数字和字母,例如:["14,14A"]如果我有一个类似["14,14A"]的示例,则总是应该具有相同的结构,例如,如果我有:
I want sort asc the values of the column indice_from, indice_from, indice_to, and indice_to and I shouldn't touch the rest of the columns of my DataFrame.Knowing that the 2 columns indice_from and indice_to contains some times a number + letter like: ["14,14A"]In case if I have an example like ["14,14A"], always I should have the same structure, for example if I have:
数字15,第二个值应为15 +字母,而15
The number 15, the second value should 15 + letter, and 15 < 15 + letter, if first value is 9, the second value should 9 + letter and 9 < 9+letter
新数据框:
Ref ° | Indice_1 | Indice_2 | 1 | 2 | indice_from | indice_from | indice_to | indice_to
---------------------------------------------------------------------------------------------------------------------------------------------
1 | 19 | 37.1 | 32 | 62 | ["10031,20031"] | ["11/12,13"] | ["30062,40062"] | ["14,14A"]
---------------------------------------------------------------------------------------------------------------------------------------------
2 | 19 | 37.1 | 44 | 12 | ["30062,40062"] | ["11/12,13"] | ["30062,40062"] | ["14,14A"]
---------------------------------------------------------------------------------------------------------------------------------------------
3 | 19 | 37.1 | 22 | 64 | ["10031,20031"] | ["11/12,13"] | ["10031,20031"] | ["11/12,13"]
---------------------------------------------------------------------------------------------------------------------------------------------
4 | 19 | 37.1 | 32 | 98 | ["10031,20031"] | ["11/12,13"] | ["30062,40062"] | ["11/12,13"]
有人可以帮助我如何对indice_from,indice_from,indice_to和indice_to列的值进行排序,以获取新的Dataframe,如上面的第二个df一样?谢谢
Someone please can help how can I sort the values of columns indice_from, indice_from, indice_to, and indice_to to obtain new Dataframe like the second df above ?Thank you
推荐答案
如果我理解正确,那么
from pyspark.sql import functions as F
columns_to_sort = ['indice_from', 'indice_from', 'indice_to', 'indice_to']
for c in columns_to_sort:
df = (
df
.withColumn(
c,
F.sort_array(c)
)
)
可以解决问题.让我知道是否可以
will do the trick. Let me know if it doesn't
这篇关于值列对pyspark进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!