Python Pandas用户警告:正在排序，因为未串联的轴未对齐

本文介绍了Python Pandas用户警告:正在排序，因为未串联的轴未对齐的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在做一些代码练习，并在执行此操作时合并数据帧收到用户警告

I'm doing some code practice and applying merging of data frames while doing thisgetting user warning

在这些代码行上:您能帮助解决此警告吗?

On these lines of code: Can you please help to get the solution of this warning.

placement_video = [self.read_sql_vdx_summary, self.read_sql_video_km]
placement_video_summary = reduce(lambda left, right: pd.merge(left, right, on='PLACEMENT', sort=False), placement_video)


placement_by_video = placement_video_summary.loc[:, ["PLACEMENT", "PLACEMENT_NAME", "COST_TYPE", "PRODUCT",
                                                     "VIDEONAME", "VIEW0", "VIEW25", "VIEW50", "VIEW75",
                                                     "VIEW100",
                                                     "ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0",
                                                     "DPE25",
                                                     "DPE50", "DPE75", "DPE100"]]

# print (placement_by_video)

placement_by_video["Placement# Name"] = placement_by_video[["PLACEMENT",
                                                            "PLACEMENT_NAME"]].apply(lambda x: ".".join(x),
                                                                                     axis=1)

placement_by_video_new = placement_by_video.loc[:,
                         ["PLACEMENT", "Placement# Name", "COST_TYPE", "PRODUCT", "VIDEONAME",
                          "VIEW0", "VIEW25", "VIEW50", "VIEW75", "VIEW100",
                          "ENG0", "ENG25", "ENG50", "ENG75", "ENG100", "DPE0", "DPE25",
                          "DPE50", "DPE75", "DPE100"]]

placement_by_km_video = [placement_by_video_new, self.read_sql_km_for_video]
placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=['PLACEMENT', 'PRODUCT'], sort=False),
                                       placement_by_km_video)

#print (list(placement_by_km_video_summary))
#print(placement_by_km_video_summary)
#exit()
# print(placement_by_video_new)
"""Conditions for 25%view"""
mask17 = placement_by_km_video_summary["PRODUCT"].isin(['Display', 'Mobile'])
mask18 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPCV"])
mask19 = placement_by_km_video_summary["PRODUCT"].isin(["InStream"])
mask20 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+", "CPCV"])
mask_video_video_completions = placement_by_km_video_summary["COST_TYPE"].isin(["CPCV"])
mask21 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE+"])
mask22 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM"])
mask23 = placement_by_km_video_summary["PRODUCT"].isin(['Display', 'Mobile', 'InStream'])
mask24 = placement_by_km_video_summary["COST_TYPE"].isin(["CPE", "CPM", "CPE+"])

choice25video_eng = placement_by_km_video_summary["ENG25"]
choice25video_vwr = placement_by_km_video_summary["VIEW25"]
choice25video_deep = placement_by_km_video_summary["DPE25"]

placement_by_km_video_summary["25_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
                                                  [choice25video_eng, choice25video_vwr, choice25video_deep])


"""Conditions for 50%view"""
choice50video_eng = placement_by_km_video_summary["ENG50"]
choice50video_vwr = placement_by_km_video_summary["VIEW50"]
choice50video_deep = placement_by_km_video_summary["DPE50"]

placement_by_km_video_summary["50_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
                                                  [choice50video_eng,
                                                   choice50video_vwr, choice50video_deep])

"""Conditions for 75%view"""

choice75video_eng = placement_by_km_video_summary["ENG75"]
choice75video_vwr = placement_by_km_video_summary["VIEW75"]
choice75video_deep = placement_by_km_video_summary["DPE75"]

placement_by_km_video_summary["75_pc_video"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
                                                  [choice75video_eng,
                                                   choice75video_vwr,
                                                   choice75video_deep])

"""Conditions for 100%view"""

choice100video_eng = placement_by_km_video_summary["ENG100"]
choice100video_vwr = placement_by_km_video_summary["VIEW100"]
choice100video_deep = placement_by_km_video_summary["DPE100"]
choicecompletions = placement_by_km_video_summary['COMPLETIONS']

placement_by_km_video_summary["100_pc_video"] = np.select([mask17 & mask22, mask19 & mask24, mask17 & mask21, mask23 & mask_video_video_completions],
                                                          [choice100video_eng, choice100video_vwr, choice100video_deep, choicecompletions])



"""conditions for 0%view"""

choice0video_eng = placement_by_km_video_summary["ENG0"]
choice0video_vwr = placement_by_km_video_summary["VIEW0"]
choice0video_deep = placement_by_km_video_summary["DPE0"]

placement_by_km_video_summary["Views"] = np.select([mask17 & mask18, mask19 & mask20, mask17 & mask21],
                                                   [choice0video_eng,
                                                    choice0video_vwr,
                                                    choice0video_deep])


#print (placement_by_km_video_summary)
#exit()

#final Table

placement_by_video_summary = placement_by_km_video_summary.loc[:,
                             ["PLACEMENT", "Placement# Name", "PRODUCT", "VIDEONAME", "COST_TYPE",
                              "Views", "25_pc_video", "50_pc_video", "75_pc_video","100_pc_video",
                              "ENGAGEMENTS","IMPRESSIONS", "DPEENGAMENTS"]]

#placement_by_km_video = [placement_by_video_summary, self.read_sql_km_for_video]
#placement_by_km_video_summary = reduce(lambda left, right: pd.merge(left, right, on=['PLACEMENT', 'PRODUCT']),
                                       #placement_by_km_video)


#print(placement_by_video_summary)
#exit()
# dup_col =["IMPRESSIONS","ENGAGEMENTS","DPEENGAMENTS"]

# placement_by_video_summary.loc[placement_by_video_summary.duplicated(dup_col),dup_col] = np.nan

# print ("Dhar",placement_by_video_summary)

'''adding views based on conditions'''
#filter maximum value from videos

placement_by_video_summary_new = placement_by_km_video_summary.loc[
    placement_by_km_video_summary.reset_index().groupby(['PLACEMENT', 'PRODUCT'])['Views'].idxmax()]
#print (placement_by_video_summary_new)
#exit()
# print (placement_by_video_summary_new)
# mask22 = (placement_by_video_summary_new.PRODUCT.str.upper ()=='DISPLAY') & (placement_by_video_summary_new.COST_TYPE=='CPE')

placement_by_video_summary_new.loc[mask17 & mask18, 'Views'] = placement_by_video_summary_new['ENGAGEMENTS']
placement_by_video_summary_new.loc[mask19 & mask20, 'Views'] = placement_by_video_summary_new['IMPRESSIONS']
placement_by_video_summary_new.loc[mask17 & mask21, 'Views'] = placement_by_video_summary_new['DPEENGAMENTS']

#print (placement_by_video_summary_new)
#exit()
placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index).append(
    placement_by_video_summary_new).sort_index()

placement_by_video_summary["Video Completion Rate"] = placement_by_video_summary["100_pc_video"] / \
                                                      placement_by_video_summary["Views"]

placement_by_video_final = placement_by_video_summary.loc[:,
                           ["Placement# Name", "PRODUCT", "VIDEONAME", "Views",
                            "25_pc_video", "50_pc_video", "75_pc_video", "100_pc_video",
                            "Video Completion Rate"]]

推荐答案

tl; dr:

concat和append当前对非连接索引(例如，如果要添加行的列)进行排序.在大熊猫0.23中，这开始产生警告.传递参数sort=True使其静音.将来，默认值将更改为不排序，因此最好现在指定sort=True或False，或者最好确保您的非串联索引匹配.

concat and append currently sort the non-concatenation index (e.g. columns if you're adding rows) if the columns don't match. In pandas 0.23 this started generating a warning; pass the parameter sort=True to silence it. In the future the default will change to not sort, so it's best to specify either sort=True or False now, or better yet ensure that your non-concatenation indices match.

该警告是 pandas 0.23.0 中的新内容:

在将来的熊猫版本中，pandas.concat()和DataFrame.append()将不再对未串联的轴进行排序.当前行为与先前的行为相同(排序)，但是现在当未指定排序且未连接的轴未对齐时会发出警告，链接

In a future version of pandas pandas.concat() and DataFrame.append() will no longer sort the non-concatenation axis when it is not already aligned. The current behavior is the same as the previous (sorting), but now a warning is issued when sort is not specified and the non-concatenation axis is not aligned,link.

链接的非常古老的 github问题，由smcinerney发表评论:

这种记录是无证的，不需要的.当然，默认行为应该是不排序.

This sort is undocumented and unwanted. Certainly the default behavior should be no-sort.

一段时间后，参数sort在 pandas.concat 和 :

After some time the parameter sort was implemented in pandas.concat and DataFrame.append:

如果连接为外部"时未对齐轴，则对它进行排序.当前默认的排序默认值已弃用，在以后的熊猫版本中将更改为不排序".

Sort non-concatenation axis if it is not already aligned when join is 'outer'. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas.

显式传递sort = True可以使警告和排序保持静音.显式传递sort = False可以使警告静音而不进行排序.

Explicitly pass sort=True to silence the warning and sort. Explicitly pass sort=False to silence the warning and not sort.

当join ='inner'时，此命令无效，该命令已经保留了非串联轴的顺序.

This has no effect when join='inner', which already preserves the order of the non-concatenation axis.

因此，如果两个DataFrame都以相同的顺序具有相同的列，则不会发出警告，也不会进行排序:

So if both DataFrames have the same columns in the same order, there is no warning and no sorting:

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['a', 'b'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b'])

print (pd.concat([df1, df2]))
   a  b
0  1  0
1  2  8
0  4  7
1  5  3

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['b', 'a'])

print (pd.concat([df1, df2]))
   b  a
0  0  1
1  8  2
0  7  4
1  3  5

但是，如果数据帧具有不同的列或相同的列，但顺序不同，则如果未显式设置参数sort(sort=None是默认值)，pandas将返回警告:

But if the DataFrames have different columns, or the same columns in a different order, pandas returns a warning if no parameter sort is explicitly set (sort=None is the default value):

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b'])

print (pd.concat([df1, df2]))

   a  b
0  1  0
1  2  8
0  4  7
1  5  3

print (pd.concat([df1, df2], sort=True))
   a  b
0  1  0
1  2  8
0  4  7
1  5  3

print (pd.concat([df1, df2], sort=False))
   b  a
0  0  1
1  8  2
0  7  4
1  3  5

如果DataFrames的列不同，但前几列对齐-它们将彼此正确分配(df1中的列a和b以及df2中的a和b在下面的示例中)，因为它们同时存在于两者中.对于在一个但不是两个DataFrame中都存在的其他列，将创建缺少的值.

If the DataFrames have different columns, but the first columns are aligned - they will be correctly assigned to each other (columns a and b from df1 with a and b from df2 in the example below) because they exist in both. For other columns that exist in one but not both DataFrames, missing values are created.

最后，如果您通过sort=True，则列将按字母数字顺序进行排序.如果sort=False和第二个DafaFrame的列不在第一列中，则它们将不进行排序地附加到末尾:

Lastly, if you pass sort=True, columns are sorted alphanumerically. If sort=False and the second DafaFrame has columns that are not in the first, they are appended to the end with no sorting:

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8], 'e':[5, 0]},
                    columns=['b', 'a','e'])
df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3], 'c':[2, 8], 'd':[7, 0]},
                    columns=['c','b','a','d'])

print (pd.concat([df1, df2]))

   a  b    c    d    e
0  1  0  NaN  NaN  5.0
1  2  8  NaN  NaN  0.0
0  4  7  2.0  7.0  NaN
1  5  3  8.0  0.0  NaN

print (pd.concat([df1, df2], sort=True))
   a  b    c    d    e
0  1  0  NaN  NaN  5.0
1  2  8  NaN  NaN  0.0
0  4  7  2.0  7.0  NaN
1  5  3  8.0  0.0  NaN

print (pd.concat([df1, df2], sort=False))

   b  a    e    c    d
0  0  1  5.0  NaN  NaN
1  8  2  0.0  NaN  NaN
0  7  4  NaN  2.0  7.0
1  3  5  NaN  8.0  0.0

在您的代码中:

placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index)
                                                       .append(placement_by_video_summary_new, sort=True)
                                                       .sort_index()

这篇关于Python Pandas用户警告:正在排序，因为未串联的轴未对齐的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！