Python中的BeautifulSoup

Python中的BeautifulSoup

本文介绍了Python中的BeautifulSoup-获取类型的第n个标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些html代码,其中包含许多< table> .

I have some html code that contains many <table>s in it.

我正在尝试在第二张表中获取信息.有没有一种方法可以不使用 soup.findAll('table')吗?

I'm trying to get the information in the second table. Is there a way to do this without using soup.findAll('table') ?

当我确实使用 soup.findAll('table')时,出现错误:

When I do use soup.findAll('table'), I get an error:

ValueError: too many values to unpack

是否有某种方法可以通过某些代码来获取第n个标签,或者以另一种方式不需要遍历所有表?还是应该查看是否可以在表中添加标题?(例如< table title ="things"> )

Is there a way to get the n-th tag in some code or another way that does not require going through all the tables? Or should I see if I can add titles to the tables? (like <table title="things">)

如果有帮助,在每个表格上方也有标头(< h4> title</h4> ).

There are also headers (<h4>title</h4>) above each table, if that helps.

谢谢.

编辑

当我问这个问题时,这就是我的想法:

Here's what I was thinking when I asked the question:

当有更多值时,我正在将对象分解为两个值.我认为这只会给我列表中的前两件事,但是,当然,它总是给我带来上述错误.我不知道返回值是一个列表,并认为它是一个特殊的对象或其他东西,而我的代码是基于我的朋友的.

I was unpacking the objects into two values, when there were many more. I thought this would just give me the first two things from the list, but of course, it kept giving me the error mentioned above. I was unaware the return value was a list and thought it was a special object or something and I was basing my code off of my friends'.

我当时在想这个错误意味着页面上的表太多了,它无法处理所有的表,所以我在寻找一种不用我使用的方法来进行处理的方法.我可能应该已经停止假设了.

I was thinking this error meant there were too many tables on the page and that it couldn't handle all of them, so I was asking for a way to do it without the method I was using. I probably should have stopped assuming things.

现在我知道它会返回一个列表,我可以在for循环中使用它,也可以通过 soup.findAll('table')[someNumber] 从中获取值.我了解了什么是拆包以及如何使用它.谢谢大家的帮助.

Now I know it returns a list and I can use this in a for loop or get a value from it with soup.findAll('table')[someNumber]. I learned what unpacking was and how to use it, as well. Thanks everyone who helped.

希望这可以解决所有问题,因为我知道我在做的问题比在问时没有意义,所以我想在这里写下我的想法.

Hopefully that clears things up, now that I know what I'm doing my question makes less sense than it did when I asked it, so I thought I'd just put a note here on what I was thinking.

这个问题现在已经很老了,但是我仍然看到我从来没有真正清楚自己在做什么.

This question is now pretty old, but I still see that I was never really clear about what I was doing.

如果对任何人有帮助,我将尝试解压缩 findAll(...)结果,其中不知道是多少.

If it helps anyone, I was attempting to unpack the findAll(...) results, of which the amount of them I didn't know.

useless_table, table_i_want, another_useless_table = soup.findAll("table");

由于页面中并不总是有我猜到的表数量,并且元组中的所有值都需要解压,所以我收到了 ValueError :

Since there weren't always the amount of tables I had guessed in the page, and all the values in the tuple need to be unpacked, I was receiving the ValueError:

ValueError: too many values to unpack

因此,我正在寻找一种方法来获取返回的元组中的第二个(或任何索引)表,而不会出现使用了多少个表的错误.

So, I was looking for the way to grab the second (or whichever index) table in the tuple returned without running into errors about how many tables were used.

推荐答案

要从调用 soup.findAll('table')中获取第二张表,请将其用作列表,只需对其进行索引:

To get the second table from the call soup.findAll('table'), use it as a list, just index it:

secondtable = soup.findAll('table')[1]

这篇关于Python中的BeautifulSoup-获取类型的第n个标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 22:06