问题描述
我有一个 DF(从会计软件收集),看起来像这样.
串行||日期 ||详情 ||价格-------------------------------1 ||第0308章安德鲁||1002 ||南||手套 ||NaN3 ||第0408章约翰逊||504 ||南||检票口 ||NaN我想合并连续的 2 行并创建一个新列产品",第二行详细信息"值.预期的输出应该类似于 ---
串行||日期 ||详情 ||价格 ||产品-------------------------------------------------1 ||第0308章安德鲁||100 ||手套3 ||第0408章约翰逊||50 ||便门如何使用 Pandas 实现这一点?
这些答案基于数据帧的格式,总是呈现遵循 OP 呈现的相同模式的行对.第一行显示一个人,第二行显示一个产品和日期,价格列是 NaN.
使用shift
然后dropna
df.assign(Product=df.Particulars.shift(-1)).dropna()系列 日期 详情 价格 产品0 1 308.0 安德鲁 100.0 手套2 3 408.0 约翰逊 50.0 检票口
加入
完全相同但不同的东西
df.join(df.Particulars.shift(-1).rename('Product')).dropna()
详情
每个请求
df.Particulars.shift(-1)
将 Particulars 列的所有成员返回一行0 手套1 约翰逊2 检票口3 南名称:详细信息,dtype:对象
当我将其分配给现有数据框
df.assign(Product=df.Particulars.shift(-1))
时,它会添加一个具有新名称'Product 的列'
此处的值是移动的详细信息.系列 日期 详情 价格 产品0 1 308.0 安德鲁 100.0 手套1 2 NaN 手套 NaN Johnson2 3 408.0 约翰逊 50.0 检票口3 4 NaN 检票口 NaN NaN
剩下的就是删除带有
NaN
值的行,我们就有了上面显示的内容.
灵感来自 @QuangHoang 的回答
如果我每隔一行切片,我就不需要依赖 dropna
df.assign(Product=df.Particulars.shift(-1))[::2]
或者更简洁
df[::2].assign(Product=[*df.Particulars[1::2]])
一种方法
这是我想到的第一种方式,而且很恶心
i = np.flatnonzero(df.Price.notna())j = i + 1df.iloc[i].assign(Product=df.iloc[j].Particulars.values)系列 日期 详情 价格 产品0 1 308.0 安德鲁 100.0 手套2 3 408.0 约翰逊 50.0 检票口
I have a DF(collected from an accounting software) which looks like this.
Serial || Date || Particulars || Price -------------------------------------- 1 || 0308 || Andrew || 100 2 || NaN || Gloves || NaN 3 || 0408 || Johnson || 50 4 || NaN || Wicket || NaN
I want to merge the 2 consecutive rows and make a new column 'Product' with 2nd rows 'Particulars' value.The expected output should look like ---
Serial || Date || Particulars || Price || Product ------------------------------------------------- 1 || 0308 || Andrew || 100 || Gloves 3 || 0408 || Johnson || 50 || Wicket
How do I achieve this with pandas?
These answers are predicated on the format of the dataframe always presenting pairs of rows that follow the same pattern presented by OP. First row shows a person, second row shows a product and date, price columns are NaN.
Use shift
then dropna
df.assign(Product=df.Particulars.shift(-1)).dropna()
Serial Date Particulars Price Product
0 1 308.0 Andrew 100.0 Gloves
2 3 408.0 Johnson 50.0 Wicket
join
Same exact thing but different
df.join(df.Particulars.shift(-1).rename('Product')).dropna()
Details
Per Request
df.Particulars.shift(-1)
brings all members of the Particulars column back one row0 Gloves 1 Johnson 2 Wicket 3 NaN Name: Particulars, dtype: object
When I assign this to the existing dataframe
df.assign(Product=df.Particulars.shift(-1))
it adds a column with a new name'Product'
where the values are the shifted Particulars.Serial Date Particulars Price Product 0 1 308.0 Andrew 100.0 Gloves 1 2 NaN Gloves NaN Johnson 2 3 408.0 Johnson 50.0 Wicket 3 4 NaN Wicket NaN NaN
All that's left is to drop the rows withe the
NaN
values and we have what is presented above.
Inspired by @QuangHoang's answer
I don't need to depend on dropna
if I slice every other row
df.assign(Product=df.Particulars.shift(-1))[::2]
Or even more terse
df[::2].assign(Product=[*df.Particulars[1::2]])
One way to do it
This was the first way I thought of and it's gross
i = np.flatnonzero(df.Price.notna())
j = i + 1
df.iloc[i].assign(Product=df.iloc[j].Particulars.values)
Serial Date Particulars Price Product
0 1 308.0 Andrew 100.0 Gloves
2 3 408.0 Johnson 50.0 Wicket
这篇关于如何合并两个连续的行并形成一个新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!