问题描述
我有很多不同的表格(和excel表格中的其他非结构化数据)..我需要创建一个数据帧超出范围'A3:D20'从'Sheet2'的Excel表'数据'所有的例子,我深入到钻取层级,但不是如何从一个确切的范围选择
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2' )
range = ['A3':'D20']#< - 如何指定?
spots = pd.DataFrame(sheet.range)#what应该是这个的确切语法?
打印(点)
一旦我得到这个,那么我打算在列A中查找一些数据,并在列B中找到相应的值。
编辑:我意识到openpyxl需要太长时间,所以更改为 pandas.read_excel('data.xlsx','Sheet2')
而不是,在这个阶段,nad的速度要快得多。
Edit2:暂时把我的数据放在一张表中,删除了我最左边一列的所有其他info.added列名,应用 index_col
,然后使用wb
这样做的一个方法是使用模块。
这里有一个例子:
from openpyxl import load_workbook
wb = load_workbook(filename ='data.xlsx',
read_only = True )
ws = wb ['Sh eet2']
#将单元格值读入列表列表
data_rows = []
for ws ['A3':'D20']:
data_cols = []
行中的单元格:
data_cols.append(cell.value)
data_rows.append(data_cols)
#转换为数据框
import pandas as pd
df = pd.DataFrame(data_rows)
I have a lot of different table (and other unstructured data in an excel sheet) .. I need to create a dataframe out of range 'A3:D20' from 'Sheet2' of Excel sheet 'data'
all examples that I come across drilldown up to sheet level, but not how to pick it from an exact range
import openpyxl
import pandas as pd
wb = openpyxl.load_workbook('data.xlsx')
sheet = wb.get_sheet_by_name('Sheet2')
range = ['A3':'D20'] #<-- how to specify this?
spots = pd.DataFrame(sheet.range) #what should be the exact syntax for this?
print (spots)
Once I get this, then I plan to lookup for some data in column A and find its corresponding value in column B
EDIT: I realised that openpyxl takes too long, and so have changed that to pandas.read_excel('data.xlsx','Sheet2')
instead, nad is much faster at that stage atleast
Edit2: For the time being, I have put my data in just one sheet and removed all other info..added column names, Applied index_col
on my leftmost column.. and then using wb.loc[] which solves it for me
One way to do this is to use the openpyxl module.
Here's an example:
from openpyxl import load_workbook
wb = load_workbook(filename='data.xlsx',
read_only=True)
ws = wb['Sheet2']
# Read the cell values into a list of lists
data_rows = []
for row in ws['A3':'D20']:
data_cols = []
for cell in row:
data_cols.append(cell.value)
data_rows.append(data_cols)
# Transform into dataframe
import pandas as pd
df = pd.DataFrame(data_rows)
这篇关于Python Pandas数据帧在excel表中读取精确的指定范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!