问题描述
我有一个数据框示例,如下所示:
I have a sample of a data frame which looks like this:
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| | Date | Professional | Description |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool | Travel to Space ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing. |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 5 | 12-30-2019 Jenn Blossoms Telephone Call to A. Bell return her multiple | NaN | NaN |
| | voicemails. | | |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
该行的很多数据都在日期单元格中。
Much of the row's data is in the date cell.
我希望样本看起来像这样:
I would like for the sample to look like this:
+---+---------------------+---------------+-------------------------------------------------------------+
| | Date | Professional | Description |
+---+---------------------+---------------+-------------------------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool | Travel to Space ... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing. |
+---+---------------------+---------------+-------------------------------------------------------------+
| 5 | 12-30-2019 | Jenn Blossoms | Telephone Call to A. Bell return her multiple |
| | | | voicemails. |
+---+---------------------+---------------+-------------------------------------------------------------+
我尝试了以下代码:
date = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[0]
name = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[1]
description = dftopdata['Date'].str.extract('(\d{2}-\d{2}-\d{4})(\s\w+\s\w+)\s(\w+.*)')[2]
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Professional'] = name
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Description'] = description
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Date'] = date
但是当我运行上面的代码时,数据框示例如下所示:
But when I run the above code, the data frame sample looks like this:
+---+------------+---------------+--------------------------------------------+
| | Date | Professional | Description |
+---+------------+---------------+--------------------------------------------+
| 0 | 12/19/2019 | Katie Cool | Travel to space ... |
+---+------------+---------------+--------------------------------------------+
| 1 | 12/20/2019 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+------------+---------------+--------------------------------------------+
| 2 | 12/27/2019 | Jenn Blossoms | Review lots of stuff/o… |
+---+------------+---------------+--------------------------------------------+
| 3 | 12/27/2019 | Jenn Blossoms | Draft email to world leader... |
+---+------------+---------------+--------------------------------------------+
| 4 | 12/30/2019 | Jenn Blossoms | Review this thing. |
+---+------------+---------------+--------------------------------------------+
| 5 | NaN | NaN | NaN |
+---+------------+---------------+--------------------------------------------+
推荐答案
您可以使用 str.split
方法将字符串拆分为单词。
You can use the str.split
method to split the string into "words".
df['list_of_words'] = dftopdata['Date'].str.split()
如果有一种模式可以从中拆分 Professional 和 Description 部分 list_of_words
-您可以使用它。例如,如果 list_of_words
的前两个单词组成专业人员的名称,那么您可以-
If there is a pattern to split the Professional and Description parts from this list_of_words
- you can use it. For instance, if the first 2 words of list_of_words
make up the name of the professional then you can do -
df['Professional'] = df.apply(lambda x: ' '.join(x['list_of_words'][:2]), axis=1)
df['Description'] = df.apply(lambda x: ' '.join(x['list_of_words'][2:]), axis=1)
这篇关于如何将数据从合并的单元格拆分为Python数据框同一行中的其他单元格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!