我是新手。我已经在pyspark中使用sql查询创建了一个数据框。我想使其成为永久性的 table ,以在将来的工作中获得优势。我用下面的代码

spark.sql("select b.ENTITYID as ENTITYID, cm.BLDGID as BldgID,cm.LEASID as LeaseID,coalesce(l.SUITID,(select EmptyDefault from EmptyDefault)) as SuiteID,(select CurrDate from CurrDate) as TxnDate,cm.INCCAT as IncomeCat,'??' as SourceCode,(Select CurrPeriod from CurrPeriod)as Period,coalesce(case when cm.DEPARTMENT ='@' then 'null' else cm.DEPARTMENT end, null) as Dept,'Lease' as ActualProjected ,fnGetChargeInd(cm.EFFDATE,cm.FRQUENCY,cm.BEGMONTH,(select CurrPeriod from CurrPeriod))*coalesce (cm.AMOUNT,0) as  ChargeAmt,0 as OpenAmt,null as Invoice,cm.CURRCODE as CurrencyCode,case when ('PERIOD.DATACLSD') is null then 'Open' else 'Closed' end as GLClosedStatus,'Unposted'as GLPostedStatus ,'Unpaid' as PaidStatus,cm.FRQUENCY as Frequency,0 as RetroPD from CMRECC cm join BLDG b on cm.BLDGID =b.BLDGID join LEAS l on cm.BLDGID =l.BLDGID and cm.LEASID =l.LEASID and (l.VACATE is null or l.VACATE >= ('select CurrDate from CurrDate')) and (l.EXPIR >= ('select CurrDate from CurrDate') or l.EXPIR < ('select RunDate from RunDate')) left outer join PERIOD on b.ENTITYID =  PERIOD.ENTITYID and ('select CurrPeriod from CurrPeriod')=PERIOD.PERIOD where ('select CurrDate from CurrDate')>=cm.EFFDATE  and (select CurrDate from CurrDate) <= coalesce(cm.EFFDATE,cast(date_add(( select min(cm2.EFFDATE) from CMRECC cm2 where cm2.BLDGID = cm.BLDGID and cm2.LEASID = cm.LEASID and cm2.INCCAT = cm.INCCAT and 'cm2.EFFDATE' > 'cm.EFFDATE'),-1) as timestamp)  ,case when l.EXPIR <(select RunDate from RunDate)then (Select RunDate from RunDate) else l.EXPIR end)").write.saveAsTable('FactChargeTempTable')

用于制作永久表,但我收到此错误
Job aborted due to stage failure: Task 11 in stage 73.0 failed 1 times, most recent failure: Lost task 11.0 in stage 73.0 (TID 2464, localhost): java.lang.RuntimeException: Unsupported data type NullType.

我不知道为什么会这样,我怎么解决。请指导

谢谢
凯莉安

最佳答案

您的Unsupported data type NullType错误表明您要保存的表的其中一列为NULL列。要变通解决此问题,您可以对表中的列进行NULL检查,并确保这些列之一不是全部为NULL。

请注意,如果一列中仅存在一行大多数为NULL,则Spark通常能够识别数据类型(例如StringType,IntegerType等),而不是NullType数据类型。

关于apache-spark - 获取java.lang.RuntimeException : Unsupported data type NullType when turning a dataframe into permanent hive table,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40194578/

10-11 04:30