python - 在“BeautifulSoup ”中查找和存储 Root过的 child

我正在尝试从父级<orgname>查找并存储子级<assignee>。到目前为止，我的代码遍历了XML文档，这些文档已经拾取了某些其他标签-我将其设置为：

for xml_string in separated_xml(infile): # Calls the output of the separated and read file to parse the data
    soup = BeautifulSoup(xml_string, "lxml")     # BeautifulSoup parses the data strings where the XML is converted to Unicode
    pub_ref = soup.findAll("publication-reference") # Beginning parsing at every instance of a publication

    lst = []  # Creating empty list to append into

    with open('./output.csv', 'ab') as f:
        writer = csv.writer(f, dialect = 'excel')

        for info in pub_ref:  # Looping over all instances of publication

# The final loop finds every instance of invention name, patent number, date, and country to print and append

            for inv_name, pat_num, date_num, country, city, state in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), assign.find("orgname"), soup.findAll("date"), soup.findAll("country"), soup.findAll("city"), soup.findAll("state")):

                writer.writerow([inv_name.text, pat_num.text, org_name.text, date_num.text, country.text, city.text, state.text])

我已经按顺序排列了，以便每个发明名称和专利对都需要，同时还需要组织受让人名称。问题在于，还有其他标签与诸如律师和类似这样的组织的事物相关联：

<agent sequence="01" rep-type="attorney">
<addressbook>
<orgname>Sawyer Law Group LLP</orgname>
<address>
<country>unknown</country>
</address>
</addressbook>
</agent>
</agents>
</parties>
<assignees>
<assignee>
<addressbook>
<orgname>International Business Machines Corporation</orgname>
<role>02</role>
<address>
<city>Armonk</city>
<state>NY</state>
<country>US</country>
</address>
</addressbook>
</assignee>
</assignees>

我只想要<assignee>标记下的组织名称。我试过了：

分配= soup.findAll（“ assignee”）
org_name = Assign.findAll（“ orgname”）

但无济于事。它只是射出：

  “ ResultSet对象没有属性'％s'。您可能正在处理
  项目列表，例如单个项目。您在致电时是否致电find_all（）
  要调用find（）吗？“％键

  AttributeError：ResultSet对象没有属性“ find”。你是
  可能将项目列表像单个项目一样对待。你打过电话吗
  当您打算调用find（）时使用find_all（）？

如何添加这些标签并在受让人标签下找到所有组织名称？
看起来很简单，但我不明白。

提前致谢。

最佳答案

assign = soup.findAll("assignee")返回一个列表，这就是为什么调用org_name = assign.findAll("orgname")失败的原因，您必须遍历assign的每个元素并调用它的.findAll("orgname")，但是似乎每个<orgname>中只有一个<assignee> ，因此无需使用.findAll代替.find。尝试使用列表理解对.find的每个元素使用assign：

orgnames = [item.find("orgname") for item in assign]

或者，要直接获取其文本，请先检查该<orgname>中是否存在<assignee>：

orgnames = [item.find("orgname").text for item in assign if item.find("orgname")]