问题描述
我从 Companies House 下载了一个包含大约 200,000 个 html 文件的 zip 文件.
I have downloaded a zip file containing around 200,000 html files from Companies House.
每个文件都采用以下两种格式之一:1) 内嵌 XBRL 格式(.html 文件扩展名)或 2) XBRL 格式(.xml 文件扩展名).查看最新的可用下载(2018 年 12 月 6 日)所有文件似乎是以前的格式(.html 文件扩展名).
Each file is in one of two formats: 1) inline XBRL format (.html file extension) or 2) XBRL format (.xml file extension). Looking at the most recent download available (6 December 2018) all the files seem to be the former format (.html file extensions).
我在 R 中使用 XBRL 包 来尝试解析这些文件.
I'm using the XBRL package in R to try and parse these files.
问题 1:XBRL 包是用于解析内嵌 XBRL 格式 (.html) 文件,还是仅适用于 XBRL (.xml) 格式?如果没有,谁能告诉我在哪里可以解析内联 XBRL 格式文件?我不完全确定内联和非内联有什么区别.
Question 1: is the XBRL package meant to parse inline XBRL format (.html) files, or is it only supposed to work on the XBRL (.xml) formats? If not, can anyone tell me where to look to parse inline XBRL format files? I'm not entirely sure what the difference is between inline and not inline.
假设 XBRL 包能够解析内联 XBRL 格式文件,我遇到了一个错误,告诉我 xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd 文件不存在.这是我的代码:
Assuming the XBRL package is meant to be able to parse inline XBRL format files, I'm hitting an error telling me that the xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd file does not exist. Here's my code:
install.packages("XBRL")
library(XBRL)
inst <- "./rawdata/Prod224_0060_00000295_20171130.html" # manually unzipped
options(stringsAsFactors = FALSE)
xbrl.vars <- xbrlDoAll(inst, cache.dir = "XBRLcache", prefix.out = NULL, verbose = TRUE)
和错误:
Schema: ./rawdata/https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd
Level: 1 ==> ./rawdata/https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd
Error in XBRL::xbrlParse(file) :
./rawdata/https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd does not exists. Aborting.
问题 2. 有人可以用基本术语解释一下我的意思吗?我是 XBRL 的新手.我需要去找这个 xsd 文件并把它放在某个地方吗?好像位于这里,但我不知道如何处理它或把它放在哪里.
Question 2. Can someone explain what this means in basic terms for me? I'm new to XBRL. Do I need to go and find this xsd file and put it somewhere? It seems to be located here, but I have no idea what to do with it or where to put it.
这里有一个类似的问题,但似乎没有完全回答,链接是都是西班牙语,我不会西班牙语.
Here's a similar question that doesn't seem fully answered and the links are all in Spanish and I don't know Spanish.
一旦我能够解析一个单一的 html XBRL 文件,我的计划就是弄清楚如何解析来自该网站的多个 zip 文件中的所有 XBRL 文件.
Once i've been able to parse one single html XBRL file, my plan is to figure out how to parse all XBRL files inside multiple zip files from that website.
推荐答案
我遇到了与美国 SEC 数据完全相同的问题.
我只是完全按照 pdw 的指导进行操作,它奏效了!
I had the exactly same problem with the US SEC data.
And I just followed exactly the guidance of pdw and it worked!
仅供参考,我使用的代码
FYI, the code I used for
if (substr(file.name, 1, 5) != "http:") {
是
if (!(substr(file.name, 1, 5) %in% c("http:", "https"))) {
我使用 trace('XBRL', edit=TRUE)
破解了它.
这篇关于XBRL 解析文件中不存在架构文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!