本文介绍了Pandas read_html 导致 TypeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 bs4 来解析一个 html 页面并提取一个表,下面给出的示例表,我试图将它加载到 Pandas 中,但是当我调用 pddataframe = pd.read_html(LOTable,skiprows=2, flavor=['bs4']) 我得到下面列出的错误,但我可以打印由 bs4 美化的表格

I'm using bs4 to parse a html page and extract a table, sample table given below and I'm trying to load it into pandas but when i call pddataframe = pd.read_html(LOTable,skiprows=2, flavor=['bs4']) I get the error listed below but I can print the tables prettified by bs4

有什么建议可以解决这个问题,而无需获取每个 td 并逐一读取?

Any suggestions how I can resolve this without needing to get each td and read in 1 by 1?

<table cellpadding="5" cellspacing="0" class="borders" width="100%">
    <tr>
     <th colspan="2">
      Learning Outcomes
     </th>
    </tr>
    <tr>
     <td class="info" colspan="2">
      On successful completion of this module the learner will be able to:
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO1
     </td>
     <td>
      Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO2
     </td>
     <td>
      Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO3
     </td>
     <td>
      Understand the various formats in which  information in relation to transactions or events is recorded and classified.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO4
     </td>
     <td>
      Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the  posting of  recorded information to the T accounts in the Nominal Ledger.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO5
     </td>
     <td>
      Prepare and present the financial statements of a Sole Trader  in prescribed format from a Trial Balance  accompanies by notes with additional information.
     </td>
    </tr>
   </table> 

错误

---------------------------------------------------------------------------  TypeError                                 Traceback (most recent call last) <ipython-input-20-12673b1a4bfc> in <module>()
     10         #Read table into pandas
     11         if first:
---> 12             pddataframe = pd.read_html(LOTable,skiprows=2, flavor=['bs4'])
     13             first = False
     14             pddataframe

C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\io\html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)
    872     _validate_header_arg(header)
    873     return _parse(flavor, io, match, header, index_col, skiprows,
--> 874                   parse_dates, tupleize_cols, thousands, attrs, encoding)

C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\io\html.py in _parse(flavor, io, match, header, index_col, skiprows, parse_dates, tupleize_cols, thousands, attrs, encoding)
    734             break
    735     else:
--> 736         raise_with_traceback(retained)
    737 
    738     ret = []

C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\compat\__init__.py in raise_with_traceback(exc, traceback)
    331         if traceback == Ellipsis:
    332             _, _, traceback = sys.exc_info()
--> 333         raise exc.with_traceback(traceback)
    334 else:
    335     # this version of raise is a syntax error in Python 3

**TypeError: 'NoneType' object is not callable**

推荐答案

感谢所有建议答案和评论中的指点,我的菜鸟错误是我在使用 bs4 提取表后将其保存在变量中.当我需要运行 pd.read_html(LOTable.prettify(),skiprows=2, flavor= 时,我正在运行 pd.read_html(LOTable,skiprows=2, flavor='bs4')'bs4')

Thanks for the pointers from all the suggested answers and comments, my rookie mistake was I had the table in a variable after extracting it using bs4.I was running pd.read_html(LOTable,skiprows=2, flavor='bs4') when I needed to run pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4')

这篇关于Pandas read_html 导致 TypeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-15 11:12