python - 获得特定NCT ID病史的临床试验

我需要以下站点的特定临床试验NCT ID的完整历史记录：https://clinicaltrials.gov/

考虑NCT id :NCT03245346

在link中，我正在检查History of Changes，这将为我提供new page中该NCT ID的所有历史记录。

我可以使用HTML解析器获取此信息：

import BeautifulSoup
import requests

url = 'https://clinicaltrials.gov/ct2/archive/NCT03245346'
r=requests.get(url)
url=r.content
soup = BeautifulSoup(url, 'html.parser')

tab = soup.find("table", {"class":"ct-data_table tr-data_table tr-tableStyle"})
print(tab)

但是，为了避免使用HTML页面格式，我想知道，是否有任何API可以获取特定NCT ID的完整历史记录？

最佳答案

如果只想获取表，可以尝试使用pandas read_html()函数：

import pandas as pd

url = "https://clinicaltrials.gov/ct2/archive/NCT03245346"

df = pd.read_html(url)[0]

df.head()

    0                               1
0   ClinicalTrials.gov Identifier:  NCT03245346
1   Study Title:                    Effects of Epidural Anesthesia and Analgesia o...
2   First Submitted:                August 2, 2017
3   Last Update Posted:             April 24, 2018

当您在ClinicalTrials.gov存档站点上单击“继续本研究的变更历史记录”时，这也适用于更“详细”的概述：

url_detail = "https://clinicaltrials.gov/ct2/history/NCT03245346"

df = pd.read_html(url_detail)[0]

但是，如果您正在寻找其他东西，也许我们也可以解决这一问题。