问题描述
我有一个包含多个表格的 .csv
文件。
使用Pandas,从这个文件中获取两个DataFrame 库存
和 HPBladeSystemRack
的策略?
输入 .csv
如下所示:
系统名称IP地址系统状态
dg-enc05正常
dg-enc05_vc_domain未知
dg-enc05-oa1 172.20.0.213正常
HP BladeSystem机架
系统名称机架名称机箱名称
dg-enc05 BU40
dg-enc05-oa1 BU40 dg-enc05
dg-enc05-oa2 BU40 dg-enc05
到目前为止,我想出的最好的是转换这个 .csv $ c $将
xlxs
),将表拆分为表,然后使用:
inventory = read_excel('path_to_file.csv','sheet1',skiprow = 1)
HPBladeSystemRack = read_excel('path_to_file.csv','sheet2',skiprow = 2)
但是:
- 需要
xlrd
模块。 - 这些日志文件必须实时分析,
- 真正的日志有比这两个更多的表。
<$ c $ c> df = pd.read_csv(jahmyst2.csv,header = None,names = range(3))
table_names = [Inventory,HP BladeSystem Rack,Network Interface]
groups = df [0] .isin(table_names).cumsum()
tables = {g.iloc [0,0]:g.iloc [1:] for df.groupby }
应该可以产生一个字典,键的表名称和值作为子表。 p>
>>> list(tables)
['HP BladeSystem Rack','Inventory']
>>>对于在table.items()中的k,v:
... print(table:,k)
... print(v)
... print $ b ...
表:HP BladeSystem机架
0 1 2
6系统名称机架名称机箱名称
7 dg-enc05 BU40 NaN
8 dg-enc05 -oa1 BU40 dg-enc05
9 dg-enc05-oa2 BU40 dg-enc05
表:库存
0 1 2
1系统名称IP地址系统状态
2 dg-enc05 NaN正常
3 dg-enc05_vc_domain NaN未知
4 dg-enc05-oa1 172.20.0.213正常
一旦你得到了,你可以设置列名称的第一行,等。
I have a single .csv
file containing multiple tables.
Using Pandas, what would be the best strategy to get two DataFrame inventory
and HPBladeSystemRack
from this one file ?
The input .csv
looks like this:
Inventory
System Name IP Address System Status
dg-enc05 Normal
dg-enc05_vc_domain Unknown
dg-enc05-oa1 172.20.0.213 Normal
HP BladeSystem Rack
System Name Rack Name Enclosure Name
dg-enc05 BU40
dg-enc05-oa1 BU40 dg-enc05
dg-enc05-oa2 BU40 dg-enc05
The best I've come up with so far is to convert this .csv
file into Excel workbook (xlxs
), split the tables into sheets and use:
inventory = read_excel('path_to_file.csv', 'sheet1', skiprow=1)
HPBladeSystemRack = read_excel('path_to_file.csv', 'sheet2', skiprow=2)
However:
- This approach requires
xlrd
module. - Those log files have to be analyzed in real time, so that it would be way better to find a way to analyze them as they come from the logs.
- The real logs have far more tables than those two.
If you know the table names beforehand, then something like this:
df = pd.read_csv("jahmyst2.csv", header=None, names=range(3))
table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"]
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}
should work to produce a dictionary with keys as the table names and values as the subtables.
>>> list(tables)
['HP BladeSystem Rack', 'Inventory']
>>> for k,v in tables.items():
... print("table:", k)
... print(v)
... print()
...
table: HP BladeSystem Rack
0 1 2
6 System Name Rack Name Enclosure Name
7 dg-enc05 BU40 NaN
8 dg-enc05-oa1 BU40 dg-enc05
9 dg-enc05-oa2 BU40 dg-enc05
table: Inventory
0 1 2
1 System Name IP Address System Status
2 dg-enc05 NaN Normal
3 dg-enc05_vc_domain NaN Unknown
4 dg-enc05-oa1 172.20.0.213 Normal
Once you've got that, you can set the column names to the first rows, etc.
这篇关于Python Pandas - 读取包含多个表的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!