Databricks读取XML文件

Databricks读取XML文件

本文介绍了Databricks读取XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好

我正在尝试将一个充满XML文件的目录读入SQL DW.首先使用Azure Functions进行了此操作,但是得到了建议,建议在使用Polybase时切换到Databricks以减少服务器负载.每小时大约有20.000个文件.

I'm trying to read a directory full of XML files into a SQL DW. First did it with Azure Functions, but got the advice to switch to Databricks for lesser server load while using Polybase. Volume is about 20.000 files per hour.

但是我找不到任何有关如何在python中读取xml文件的示例.我为python本身找到了一些简单的示例,但是当尝试导入那些脚本中的库时,它会失败.

But I can't find any example on how to read a xml file in python. I found some easy sample for python itself but when trying to import the libraries that are in those script it fails.

任何人都可以将我推向正确的方向.

Anybody could push me in the right direction.

例如,这是我收到的python local脚本.

This is for example a script I've received for python local.

import pandas as pd
import xml.etree.ElementTree as ET
import re
import os
xmlfolder = 'Energy_RT'
xmlfiles = os.listdir(xmlfolder)
##Get attribute names (for now I took all leafs of the xml structure)
firstfile = os.path.join(xmlfolder, xmlfiles[0])
root = ET.parse(firstfile).getroot()
attributes = [node.tag for node in root.iter() if len(node)==0]
clean_attribute_names = [re.sub(r'\{.*\}', '', a) for a in attributes]
#Create Dataframe and save it as csv
df = pd.DataFrame(columns=clean_attribute_names, index=xmlfiles)
for xf in xmlfiles:
    root = ET.parse(os.path.join(xmlfolder,xf)).getroot()
    df.loc[xf] = [node.text for node in root.iter() if node.tag in attributes]
df.to_csv('out.csv')

安装,列出,卸载都可以,但是从xml到csv的转换让我很头疼.

Mounting, listing, unmount all works fine, but the xml to csv conversion breaks my head.

我不是python开发人员...

I'm not a python developer ...

推荐答案

您在使用库时遇到什么错误?

What errors are you getting with the libraries?


这篇关于Databricks读取XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-07 05:56