值列对pyspark进行排序

本文介绍了值列对pyspark进行排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下是我的DataFrame:

I have this DataFrame bellow:

Ref °     | Indice_1 | Indice_2      | 1    |   2   |  indice_from     |    indice_from      |      indice_to    |  indice_to
---------------------------------------------------------------------------------------------------------------------------------------------
1         |   19     |   37.1        |  32       |    62      |  ["20031,10031"]  |   ["13,11/12"]     |     ["40062,30062"] |  ["14A,14"]
---------------------------------------------------------------------------------------------------------------------------------------------
2         |   19     |   37.1        |  44       |    12      |  ["40062,30062"]  |   ["13,11/12"]     |     ["40062,30062"] |  ["14A,14"]
---------------------------------------------------------------------------------------------------------------------------------------------
3         |   19     |   37.1        |  22       |    64      |  ["20031,10031"]  |   ["13,11/12"]       |     ["20031,10031"] |  ["13,11/12"]
---------------------------------------------------------------------------------------------------------------------------------------------
4         |   19     |   37.1        |  32       |    98      |  ["20032,10032"]  |   ["13,11/12"]     |     ["40062,30062"] |  ["13,11/12"]

我想按升序对indice_from，indice_from，indice_to和indice_to列的值进行排序，并且我不应该触摸DataFrame的其余列.知道indice_from和indice_to的2列有时包含数字和字母，例如:["14,14A"]如果我有一个类似["14,14A"]的示例，则总是应该具有相同的结构，例如，如果我有:

I want sort asc the values of the column indice_from, indice_from, indice_to, and indice_to and I shouldn't touch the rest of the columns of my DataFrame.Knowing that the 2 columns indice_from and indice_to contains some times a number + letter like: ["14,14A"]In case if I have an example like ["14,14A"], always I should have the same structure, for example if I have:

数字15，第二个值应为15 +字母，而15

The number 15, the second value should 15 + letter, and 15 < 15 + letter, if first value is 9, the second value should 9 + letter and 9 < 9+letter

新数据框:

Ref °     | Indice_1 | Indice_2      | 1    |   2   |  indice_from     |    indice_from      |      indice_to     |  indice_to
---------------------------------------------------------------------------------------------------------------------------------------------
1         |   19     |   37.1        |  32       |    62      |  ["10031,20031"]  |   ["11/12,13"]       |     ["30062,40062"] |  ["14,14A"]
---------------------------------------------------------------------------------------------------------------------------------------------
2         |   19     |   37.1        |  44       |    12      |  ["30062,40062"]  |   ["11/12,13"]       |     ["30062,40062"] |  ["14,14A"]
---------------------------------------------------------------------------------------------------------------------------------------------
3         |   19     |   37.1        |  22       |    64      |  ["10031,20031"]  |   ["11/12,13"]       |     ["10031,20031"] |  ["11/12,13"]
---------------------------------------------------------------------------------------------------------------------------------------------
4         |   19     |   37.1        |  32       |    98      |  ["10031,20031"]  |   ["11/12,13"]       |     ["30062,40062"] |  ["11/12,13"]

有人可以帮助我如何对indice_from，indice_from，indice_to和indice_to列的值进行排序，以获取新的Dataframe，如上面的第二个df一样?谢谢

Someone please can help how can I sort the values of columns indice_from, indice_from, indice_to, and indice_to to obtain new Dataframe like the second df above ?Thank you

值列对pyspark进行排序

问题描述

推荐答案