I'm trying to create a schema for my new DataFrame and have tried various combinations of brackets and keywords but have been unable to figure out how to make this work. My current attempt:from pyspark.sql.types import *schema = StructType([ StructField("User", IntegerType()), ArrayType(StructType([ StructField("user", StringType()), StructField("product", StringType()), StructField("rating", DoubleType())])) ])Comes back with the error:elementType should be DataTypeTraceback (most recent call last): File "/usr/hdp/current/spark2-client/python/pyspark/sql/types.py", line 290, in __init__assert isinstance(elementType, DataType), "elementType should be DataType"AssertionError: elementType should be DataTypeI have googled, but so far no good examples of an array of objects. 解决方案 You will need an additional StructField for ArrayType property. This one should work:from pyspark.sql.types import *schema = StructType([ StructField("User", IntegerType()), StructField("My_array", ArrayType( StructType([ StructField("user", StringType()), StructField("product", StringType()), StructField("rating", DoubleType()) ]) )])For more information check this link: http://nadbordrozd.github.io/blog/2016/05/22/one-weird-trick-that-will-fix-your-pyspark-schemas/ 这篇关于创建一个涉及ArrayType的Pyspark模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 阿里云证书,YYDS!
05-23 02:35