本文介绍了如何使用地名ID丰富地点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个地方列表,可以使用地名中的ID进行充实.由于默认情况下,地理名称已嵌入WikiData中,因此我选择使用WikiData端点直接通过SPARQL进行访问.
I have a list of places which I would enrich with the IDs from geonames.Since geonames by default it's embedded into WikiData I chose to go directly via SPARQL using WikiData endpoint.
我的工作流程:
- 我已将excel文件导入OpenRefine并创建了一个新项目
-
在OpenRefine中,我创建了图形,然后将其下载为RDF/XML.这是快照:
- I have imported the excel file into OpenRefine and created a new project
In OpenRefine I have created my graph, then I have downloaded it as RDF/XML. Here a snapshot:
<rdf:Description rdf:about="http://localhost:3333/0">
<rdfs:label>Aïre</rdfs:label>
<crm:P1_is_identified_by>5A1CE163-105F-4BAF 8BF9</crm:P1_is_identified_by>
</rdf:Description>
我先将RDF文件导入到本地graphDB中,然后运行联合查询:
I have imported then the RDF file into my local graphDB and I runned the federated query:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {?place <http://purl.org/NET/cidoc-crm/core#P1_is_identified_by> ?value;
rdfs:label ?label_geo.
SERVICE <https://query.wikidata.org/sparql> {
?value wdt:P31/wdt:P279* wd:Q515;
rdfs:label ?label;
wdt:P1566 ?id_value.
}
}
limit 10
没有结果.
输出应该是这样的:
|-----------------------|------------------|---------------|
| Oggetto | Place | GeonamesID |
|-----------------------|------------------|---------------|
|5A1CE163-105F-4BAF 8BF9| Aïre |11048419 |
|-----------------------|------------------|---------------|
建议?
非常感谢.
推荐答案
我直接通过客户端解决了问题
I solved the problem directly via client
这是我的管道:
- 我创建了一个带有地名列表的Excel工作表
- 我构建了一个Python脚本,该脚本使用excel工作表中的值作为查询参数,并将输出保存在.txt文件中.例如.艾尔, https://www.geonames.org/11048419
import pandas as pd
import requests
import json
import csv
url = 'http://api.geonames.org/searchJSON?'
#Change df parameters according to excel sheet specification.
df = pd.read_excel('grp.xlsx', sheet_name='Foglio14', usecols="A")
for item in df.place_name:
df.place_name.head()
#Change username params with geonames API username
params ={ 'username': "XXXXXXXX",
'name_equals': item,
'maxRows': "1"}
e = requests.get(url, params=params)
pretty_json = json.loads(e.text)
with open("data14.txt", "a") as myfile:
writer = csv.writer(myfile)
for item in pretty_json["geonames"]:
#print("{}, https://www.geonames.org/{}".format(item["name"], item["geonameId"]))
writer.writerow([item["name"], "https://www.geonames.org/{}".format(item["geonameId"])]) #Write row.
myfile.close()
- 我已将.txt文件的输出复制到excel工作表的B列中.
- 我将输出值分成两列.例如
|---------------------|-----------------------------------|
| ColA | ColB |
|---------------------|-----------------------------------|
| Aïre | https://www.geonames.org/11048419 |
|---------------------|-----------------------------------|
- 由于地名与获得的结果之间没有1:1的对应关系,因此我将这些值对齐.
- 在excel表中,我创建了一个新的空列B
- 在B列中,我写了公式:
=IF(ISNA(MATCH(A1;C:C;0));"";INDEX(C:C;MATCH(A1;C:C;0)))
,并且迭代了该公式,直到列表结尾 - 然后我创建了一个新的空列C
- 在C列中,我写了公式:
=IFERROR(INDEX($E:$E;MATCH($B1;$D:$D;0));"")
,并且迭代了该公式,直到列表结尾
- Since there is no a 1:1 correspondence between place name and the obtained results I have aligned the values.
- In the excel sheet I have created a new empty column B
- In the column B I wrote the formula:
=IF(ISNA(MATCH(A1;C:C;0));"";INDEX(C:C;MATCH(A1;C:C;0)))
and I have iterated the formula till the end of the list - Then I have created a new empty column C
- In the column C I wrote the formula:
=IFERROR(INDEX($E:$E;MATCH($B1;$D:$D;0));"")
and I have iterated the formula till the end of the list
这是最终结果:
这篇关于如何使用地名ID丰富地点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!