问题描述
如何使用使用两个现有表作为输入的Python脚本创建新表?例如,通过使用
如何更改该设置以引用多个表?
样本数据
这里有两个表可以存储为CSV文件并可以使用<$ c加载$ c>首页>获取数据>文本/ CSV
表1
Date,Value1
2108-10-12,1
2108-10-13,2
2108-10-14,3
2108-10 -15,4
2108-10-16,5
表2
Date,Value2
2108-10-12,10
2108-10-13, 11
2108-10-14,12
2108-10-15,13
2108-10-16,14
这与针对R脚本描述的挑战相同
详细信息:
列表上面必须要非常注意使事情正常工作很麻烦。因此,这里有所有肮脏的小细节:
1。。使用获取数据。
2。单击编辑查询
。
3。。在表1
中,单击<$旁边的符号。 c $ c>日期列,选择文本
,然后单击替换当前
4。 Table2
5。在主页
标签,单击输入数据
6。。在出现的框中,除了点击 OK
以外,什么都不做。
7。。这将插入一个名为 Table3 $的空表。 c $ c>在
查询
下,而这正是我们想要的:
8。。转到 Transform
标签,然后点击运行Python脚本
:
9。。这将打开运行Python脚本
编辑器。您可以 在此处开始编写脚本,但这会使后续步骤不必要地变得复杂。因此,除了单击 OK
:
10。。在编辑栏中,您将使用公式 = Python.Execute(#'dataset'保存输入的数据此脚本#(lf),[数据集=#更改的类型])
。并且请注意,您在应用步骤下有了一个名为 Run Python Script
的新步骤:
11。。上面的屏幕截图中有几个有趣的细节,但是首先我们要分解一下函数 = Python.Execute(#'dataset'保存此脚本的输入数据#(lf),[dataset =# Changed Type])
。
部分#'dataset'保存此脚本的输入数据#(lf)
只需插入您可以在 Python脚本编辑器
中看到的注释,所以这并不重要,但您也不能将其留空,我喜欢使用一些简短的内容,例如#Python:
。
部分 [dataset =# Changed Type ]
是指向空的 Table3
的指针更改类型
。因此,如果您在插入Python脚本之前所做的最后一件事情是更改数据类型,而这又看起来不一样。然后使用 dataset
作为熊猫数据框在您的python脚本中提供该表。考虑到这一点,我们可以对公式进行非常有用的更改:
12。。将公式栏更改为 = Python.Execute(#Python:,[df1 = Table1,df2 = Table2])
并按 Enter
。这将使 Table1
和 Table2
可以作为两个名为 df1
和 df2
。
13。。 应用步骤
:
运行Python脚本
旁边的齿轮(或它是一朵花?)图标b $ b
14。。插入以下代码段:
代码:
进口熊猫如pd
df3 = pd.merge(df1,df2,how ='left',on = ['Date'])
df3 ['Value3'] = df1 ['Value1'] * df2 [ 'Value2']
这将加入 df1
和 df2
在 Date列
上,并插入一个新的计算出的列,名为 Value3
。不太花哨,但是通过这种设置,您可以使用Power BI领域中的数据以及Python的功能来执行任何您想做的事情。
15。单击确定
,您将看到以下内容:
您会在输入数据框 df1
和 df2 $ c $下看到
df3
c>在蓝色方块中。如果您已在Python脚本中分配了其他数据框作为计算的步骤,它们也会在此处列出。要将其变成Power BI的可访问表,只需单击表
,如绿色箭头所示。
16。就是这样:
请注意,日期列
的数据类型默认设置为日期
,但是您可以如前所述,将其更改为 Text
。
点击 Home>关闭并应用
退出 Power Query Editor
,并返回到Power BI Desktop中所有开始的位置。
How can you create a new table with a Python script that uses two existing tables as input? For example by performing a left join
using pandas merge?
Some details:
Using Home > Edit queries
you can utilize Python under Transform > Run Python Script
. This opens a Run Python Script
dialog box where your're told that '#dataset' holds the input data for this script
. And you'll find the same phrase if you just click OK
and look at the formula bar:
= Python.Execute("# 'dataset' holds the input data for this script#(lf)",[dataset=#"Changed Type"])
This also adds a new step under Applied Steps
called Run Python script
where you can edit the Python script by clicking the gear symbol on the right:
How can you change that setup to reference more than one table?
Sample data
Here are two tables that can be stored as CSV files and loaded using Home > Get Data > Text/CSV
Table1
Date,Value1
2108-10-12,1
2108-10-13,2
2108-10-14,3
2108-10-15,4
2108-10-16,5
Table2
Date,Value2
2108-10-12,10
2108-10-13,11
2108-10-14,12
2108-10-15,13
2108-10-16,14
This is the same challenge that has been described for R scripts here. That setup should work for Python too. However, I've found that that approcah has one drawback: It stores the new joined or calculated table as an edited version of one of the previous tables. The following suggestion will demonstrate how you can produce a completely new calculated table without altering the input tables (except changing the data type of the Date columns from Date
to Text
because of this.)
Short answer:
In the Power Query editor
, follow these steps:
Change the data type of the
Date columns
in both columns toText
.Click
Enter Data
. Only clickOK
.Activate the new
Table3
and useTransform > Run Python Script
. Only clickOK
.Activate the formula bar and replace what's in it with
= Python.Execute("# Python:",[df1=Table1, df2=Table2])
. ClickEnter
.If you're prompted to do so, click
Edit Permission
andRun
in the next step.Under
Applied Steps
, in the new step namedRun Python Script
, click the gear icon to open theRun Python Script
editor.Insert the snippet below and click
OK
.
Code:
import pandas as pd
df3 = pd.merge(df1, df2, how = 'left', on = ['Date'])
df3['Value3'] = df1['Value1']*df2['Value2']
Next to df3
, click Table
, and that's it:
The details:
The list above will have to be followed very carefully to get things working. So here are all of the dirty little details:
1. Load the tables as CSV files in Power BI Desktop using Get Data
.
2. Click Edit Queries
.
3. In Table1
, Click the symbol next to the Date column
, select Text
and click Replace Current
4. Do the same for Table2
5. On the Home
tab, click Enter Data
6. In the appearing box, do nothing else than clicking OK
.
7. This will insert an empty table named Table3
under Queries
, and that's exactly what we want:
8. Go to the Transform
tab and click Run Python Script
:
9. This opens the Run Python Script
editor. And you can start writing you scripts right here, but that will make things unnecessarily complicated in the next steps. So do nothing but click OK
:
10. In the formula bar you will se the formula = Python.Execute("# 'dataset' holds the input data for this script#(lf)",[dataset=#"Changed Type"])
. And notice that you've got a new step under Applied Steps named Run Python Script
:
11. There are several interesting details in the screenshot above, but first we're going to break down the arguments of the function = Python.Execute("# 'dataset' holds the input data for this script#(lf)",[dataset=#"Changed Type"])
.
The part "# 'dataset'" holds the input data for this script#(lf)"
simply inserts the comment that you can see in the Python Script Editor
. So it's not important, but you can't just leave it blank either. I like to use something shorter like "# Python:"
.
The part [dataset=#"Changed Type"]
is a pointer to the empty Table3
in the state that it is under Changed Type
. So if the last thing that you do before inserting a Python Script is something else than changing data types, this part will look different. The table is then made available in your python script using dataset
as a pandas data frame. With this in mind, we can make som very useful changes to the formula:
12. Change the formula bar to = Python.Execute("# Python:",[df1=Table1, df2=Table2])
and hit Enter
. This will make Table1
and Table2
available for your Python scripts as two pandas dataframes named df1
and df2
, respectively.
13. Click the gear (or is it a flower?) icon next to Run Python script
under Applied Steps
:
14. Insert the following snippet:
Code:
import pandas as pd
df3 = pd.merge(df1, df2, how = 'left', on = ['Date'])
df3['Value3'] = df1['Value1']*df2['Value2']
This will join df1
and df2
on the Date column
, and insert a new calculated column named Value3
. Not too fancy, but with this setup you can do anything you want with your data in the world of Power BI and with the power of Python.
15. Click OK
and you'll se this:
You'll see df3
listed under the input dataframes df1
and df2
in the blue square. If you've assigned any other dataframes as a step in your calculations in the Python script, they will be listed here too. In order to turn it into an accessible table for Power BI, just click Table
as indicated by the green arrow.
16. And that's it:
Note that the data type of the Date column
is set to Date
by default, but you can change that to Text
as explained earlier.
Click Home > Close&Apply
to exit the Power Query Editor
and go back to where it all started in Power BI Desktop.
这篇关于Power BI:如何在Power Query Editor中将Python与多个表一起使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!