我最近的任务是将 PMML 解析回 R 模型。 (我已经进行了广泛的搜索,并且没有为您执行此转换的库。)我正在尝试将包含多项逻辑回归的 PMML 转换回 R 模型,但我不知道如何转换任何PMML 文档中保存的系数与 R 模型保存的系数。
PMML 如下:
<?xml version="1.0"?>
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
<Header copyright="Copyright (c) 2014 hlin117" description="Generalized Linear Regression Model">
<Extension name="user" value="hlin117" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.4"/>
<Timestamp>2014-06-23 13:04:17</Timestamp>
</Header>
<DataDictionary numberOfFields="13">
<DataField name="audit.train$TARGET_Adjusted" optype="continuous" dataType="double"/>
<DataField name="ID" optype="continuous" dataType="double"/>
<DataField name="Age" optype="continuous" dataType="double"/>
<DataField name="Employment" optype="categorical" dataType="string">
<Value value="Consultant"/>
<Value value="Private"/>
<Value value="PSFederal"/>
<Value value="PSLocal"/>
<Value value="PSState"/>
<Value value="SelfEmp"/>
<Value value="Volunteer"/>
</DataField>
<DataField name="Education" optype="categorical" dataType="string">
<Value value="Associate"/>
<Value value="Bachelor"/>
<Value value="College"/>
<Value value="Doctorate"/>
<Value value="HSgrad"/>
<Value value="Master"/>
<Value value="Preschool"/>
<Value value="Professional"/>
<Value value="Vocational"/>
<Value value="Yr10"/>
<Value value="Yr11"/>
<Value value="Yr12"/>
<Value value="Yr1t4"/>
<Value value="Yr5t6"/>
<Value value="Yr7t8"/>
<Value value="Yr9"/>
</DataField>
<DataField name="Marital" optype="categorical" dataType="string">
<Value value="Absent"/>
<Value value="Divorced"/>
<Value value="Married"/>
<Value value="Married-spouse-absent"/>
<Value value="Unmarried"/>
<Value value="Widowed"/>
</DataField>
<DataField name="Occupation" optype="categorical" dataType="string">
<Value value="Cleaner"/>
<Value value="Clerical"/>
<Value value="Executive"/>
<Value value="Farming"/>
<Value value="Home"/>
<Value value="Machinist"/>
<Value value="Military"/>
<Value value="Professional"/>
<Value value="Protective"/>
<Value value="Repair"/>
<Value value="Sales"/>
<Value value="Service"/>
<Value value="Support"/>
<Value value="Transport"/>
</DataField>
<DataField name="Income" optype="continuous" dataType="double"/>
<DataField name="Gender" optype="categorical" dataType="string">
<Value value="Female"/>
<Value value="Male"/>
</DataField>
<DataField name="Deductions" optype="continuous" dataType="double"/>
<DataField name="Hours" optype="continuous" dataType="double"/>
<DataField name="IGNORE_Accounts" optype="categorical" dataType="string">
<Value value="Canada"/>
<Value value="China"/>
<Value value="Columbia"/>
<Value value="Cuba"/>
<Value value="Ecuador"/>
<Value value="England"/>
<Value value="Fiji"/>
<Value value="Germany"/>
<Value value="Greece"/>
<Value value="Guatemala"/>
<Value value="Hong"/>
<Value value="Hungary"/>
<Value value="India"/>
<Value value="Indonesia"/>
<Value value="Iran"/>
<Value value="Ireland"/>
<Value value="Italy"/>
<Value value="Jamaica"/>
<Value value="Japan"/>
<Value value="Malaysia"/>
<Value value="Mexico"/>
<Value value="NewZealand"/>
<Value value="Nicaragua"/>
<Value value="Philippines"/>
<Value value="Poland"/>
<Value value="Portugal"/>
<Value value="Scotland"/>
<Value value="Singapore"/>
<Value value="Taiwan"/>
<Value value="UnitedStates"/>
<Value value="Vietnam"/>
<Value value="Yugoslavia"/>
</DataField>
<DataField name="RISK_Adjustment" optype="continuous" dataType="double"/>
</DataDictionary>
<GeneralRegressionModel modelName="General_Regression_Model" modelType="generalizedLinear" functionName="regression" algorithmName="glm" distribution="binomial" linkFunction="logit">
<MiningSchema>
<MiningField name="audit.train$TARGET_Adjusted" usageType="predicted"/>
<MiningField name="ID" usageType="active"/>
<MiningField name="Age" usageType="active"/>
<MiningField name="Employment" usageType="active"/>
<MiningField name="Education" usageType="active"/>
<MiningField name="Marital" usageType="active"/>
<MiningField name="Occupation" usageType="active"/>
<MiningField name="Income" usageType="active"/>
<MiningField name="Gender" usageType="active"/>
<MiningField name="Deductions" usageType="active"/>
<MiningField name="Hours" usageType="active"/>
<MiningField name="IGNORE_Accounts" usageType="active"/>
<MiningField name="RISK_Adjustment" usageType="active"/>
</MiningSchema>
<Output>
<OutputField name="Predicted_audit.train$TARGET_Adjusted" feature="predictedValue"/>
</Output>
<ParameterList>
<Parameter name="p0" label="(Intercept)"/>
<Parameter name="p1" label="ID"/>
<Parameter name="p2" label="Age"/>
<Parameter name="p3" label="EmploymentPrivate"/>
<Parameter name="p4" label="EmploymentPSFederal"/>
<Parameter name="p5" label="EmploymentPSLocal"/>
<Parameter name="p6" label="EmploymentPSState"/>
<Parameter name="p7" label="EmploymentSelfEmp"/>
<Parameter name="p8" label="EmploymentVolunteer"/>
<Parameter name="p9" label="EducationBachelor"/>
<Parameter name="p10" label="EducationCollege"/>
<Parameter name="p11" label="EducationDoctorate"/>
<Parameter name="p12" label="EducationHSgrad"/>
<Parameter name="p13" label="EducationMaster"/>
<Parameter name="p14" label="EducationPreschool"/>
<Parameter name="p15" label="EducationProfessional"/>
<Parameter name="p16" label="EducationVocational"/>
<Parameter name="p17" label="EducationYr10"/>
<Parameter name="p18" label="EducationYr11"/>
<Parameter name="p19" label="EducationYr12"/>
<Parameter name="p20" label="EducationYr1t4"/>
<Parameter name="p21" label="EducationYr5t6"/>
<Parameter name="p22" label="EducationYr7t8"/>
<Parameter name="p23" label="EducationYr9"/>
<Parameter name="p24" label="MaritalDivorced"/>
<Parameter name="p25" label="MaritalMarried"/>
<Parameter name="p26" label="MaritalMarried-spouse-absent"/>
<Parameter name="p27" label="MaritalUnmarried"/>
<Parameter name="p28" label="MaritalWidowed"/>
<Parameter name="p29" label="OccupationClerical"/>
<Parameter name="p30" label="OccupationExecutive"/>
<Parameter name="p31" label="OccupationFarming"/>
<Parameter name="p32" label="OccupationHome"/>
<Parameter name="p33" label="OccupationMachinist"/>
<Parameter name="p34" label="OccupationMilitary"/>
<Parameter name="p35" label="OccupationProfessional"/>
<Parameter name="p36" label="OccupationProtective"/>
<Parameter name="p37" label="OccupationRepair"/>
<Parameter name="p38" label="OccupationSales"/>
<Parameter name="p39" label="OccupationService"/>
<Parameter name="p40" label="OccupationSupport"/>
<Parameter name="p41" label="OccupationTransport"/>
<Parameter name="p42" label="Income"/>
<Parameter name="p43" label="GenderMale"/>
<Parameter name="p44" label="Deductions"/>
<Parameter name="p45" label="Hours"/>
<Parameter name="p46" label="IGNORE_AccountsChina"/>
<Parameter name="p47" label="IGNORE_AccountsColumbia"/>
<Parameter name="p48" label="IGNORE_AccountsCuba"/>
<Parameter name="p49" label="IGNORE_AccountsEcuador"/>
<Parameter name="p50" label="IGNORE_AccountsEngland"/>
<Parameter name="p51" label="IGNORE_AccountsFiji"/>
<Parameter name="p52" label="IGNORE_AccountsGermany"/>
<Parameter name="p53" label="IGNORE_AccountsGreece"/>
<Parameter name="p54" label="IGNORE_AccountsGuatemala"/>
<Parameter name="p55" label="IGNORE_AccountsHong"/>
<Parameter name="p56" label="IGNORE_AccountsHungary"/>
<Parameter name="p57" label="IGNORE_AccountsIndia"/>
<Parameter name="p58" label="IGNORE_AccountsIndonesia"/>
<Parameter name="p59" label="IGNORE_AccountsIran"/>
<Parameter name="p60" label="IGNORE_AccountsIreland"/>
<Parameter name="p61" label="IGNORE_AccountsItaly"/>
<Parameter name="p62" label="IGNORE_AccountsJamaica"/>
<Parameter name="p63" label="IGNORE_AccountsJapan"/>
<Parameter name="p64" label="IGNORE_AccountsMalaysia"/>
<Parameter name="p65" label="IGNORE_AccountsMexico"/>
<Parameter name="p66" label="IGNORE_AccountsNewZealand"/>
<Parameter name="p67" label="IGNORE_AccountsNicaragua"/>
<Parameter name="p68" label="IGNORE_AccountsPhilippines"/>
<Parameter name="p69" label="IGNORE_AccountsPoland"/>
<Parameter name="p70" label="IGNORE_AccountsPortugal"/>
<Parameter name="p71" label="IGNORE_AccountsScotland"/>
<Parameter name="p72" label="IGNORE_AccountsSingapore"/>
<Parameter name="p73" label="IGNORE_AccountsTaiwan"/>
<Parameter name="p74" label="IGNORE_AccountsUnitedStates"/>
<Parameter name="p75" label="IGNORE_AccountsVietnam"/>
<Parameter name="p76" label="IGNORE_AccountsYugoslavia"/>
<Parameter name="p77" label="RISK_Adjustment"/>
</ParameterList>
<FactorList>
<Predictor name="Employment"/>
<Predictor name="Education"/>
<Predictor name="Marital"/>
<Predictor name="Occupation"/>
<Predictor name="Gender"/>
<Predictor name="IGNORE_Accounts"/>
</FactorList>
<CovariateList>
<Predictor name="ID"/>
<Predictor name="Age"/>
<Predictor name="Income"/>
<Predictor name="Deductions"/>
<Predictor name="Hours"/>
<Predictor name="RISK_Adjustment"/>
</CovariateList>
<PPMatrix>
<PPCell value="1" predictorName="ID" parameterName="p1"/>
<PPCell value="1" predictorName="Age" parameterName="p2"/>
<PPCell value="Private" predictorName="Employment" parameterName="p3"/>
<PPCell value="PSFederal" predictorName="Employment" parameterName="p4"/>
<PPCell value="PSLocal" predictorName="Employment" parameterName="p5"/>
<PPCell value="PSState" predictorName="Employment" parameterName="p6"/>
<PPCell value="SelfEmp" predictorName="Employment" parameterName="p7"/>
<PPCell value="Volunteer" predictorName="Employment" parameterName="p8"/>
<PPCell value="Bachelor" predictorName="Education" parameterName="p9"/>
<PPCell value="College" predictorName="Education" parameterName="p10"/>
<PPCell value="Doctorate" predictorName="Education" parameterName="p11"/>
<PPCell value="HSgrad" predictorName="Education" parameterName="p12"/>
<PPCell value="Master" predictorName="Education" parameterName="p13"/>
<PPCell value="Preschool" predictorName="Education" parameterName="p14"/>
<PPCell value="Professional" predictorName="Education" parameterName="p15"/>
<PPCell value="Vocational" predictorName="Education" parameterName="p16"/>
<PPCell value="Yr10" predictorName="Education" parameterName="p17"/>
<PPCell value="Yr11" predictorName="Education" parameterName="p18"/>
<PPCell value="Yr12" predictorName="Education" parameterName="p19"/>
<PPCell value="Yr1t4" predictorName="Education" parameterName="p20"/>
<PPCell value="Yr5t6" predictorName="Education" parameterName="p21"/>
<PPCell value="Yr7t8" predictorName="Education" parameterName="p22"/>
<PPCell value="Yr9" predictorName="Education" parameterName="p23"/>
<PPCell value="Divorced" predictorName="Marital" parameterName="p24"/>
<PPCell value="Married" predictorName="Marital" parameterName="p25"/>
<PPCell value="Married-spouse-absent" predictorName="Marital" parameterName="p26"/>
<PPCell value="Unmarried" predictorName="Marital" parameterName="p27"/>
<PPCell value="Widowed" predictorName="Marital" parameterName="p28"/>
<PPCell value="Clerical" predictorName="Occupation" parameterName="p29"/>
<PPCell value="Executive" predictorName="Occupation" parameterName="p30"/>
<PPCell value="Farming" predictorName="Occupation" parameterName="p31"/>
<PPCell value="Home" predictorName="Occupation" parameterName="p32"/>
<PPCell value="Machinist" predictorName="Occupation" parameterName="p33"/>
<PPCell value="Military" predictorName="Occupation" parameterName="p34"/>
<PPCell value="Professional" predictorName="Occupation" parameterName="p35"/>
<PPCell value="Protective" predictorName="Occupation" parameterName="p36"/>
<PPCell value="Repair" predictorName="Occupation" parameterName="p37"/>
<PPCell value="Sales" predictorName="Occupation" parameterName="p38"/>
<PPCell value="Service" predictorName="Occupation" parameterName="p39"/>
<PPCell value="Support" predictorName="Occupation" parameterName="p40"/>
<PPCell value="Transport" predictorName="Occupation" parameterName="p41"/>
<PPCell value="1" predictorName="Income" parameterName="p42"/>
<PPCell value="Male" predictorName="Gender" parameterName="p43"/>
<PPCell value="1" predictorName="Deductions" parameterName="p44"/>
<PPCell value="1" predictorName="Hours" parameterName="p45"/>
<PPCell value="China" predictorName="IGNORE_Accounts" parameterName="p46"/>
<PPCell value="Columbia" predictorName="IGNORE_Accounts" parameterName="p47"/>
<PPCell value="Cuba" predictorName="IGNORE_Accounts" parameterName="p48"/>
<PPCell value="Ecuador" predictorName="IGNORE_Accounts" parameterName="p49"/>
<PPCell value="England" predictorName="IGNORE_Accounts" parameterName="p50"/>
<PPCell value="Fiji" predictorName="IGNORE_Accounts" parameterName="p51"/>
<PPCell value="Germany" predictorName="IGNORE_Accounts" parameterName="p52"/>
<PPCell value="Greece" predictorName="IGNORE_Accounts" parameterName="p53"/>
<PPCell value="Guatemala" predictorName="IGNORE_Accounts" parameterName="p54"/>
<PPCell value="Hong" predictorName="IGNORE_Accounts" parameterName="p55"/>
<PPCell value="Hungary" predictorName="IGNORE_Accounts" parameterName="p56"/>
<PPCell value="India" predictorName="IGNORE_Accounts" parameterName="p57"/>
<PPCell value="Indonesia" predictorName="IGNORE_Accounts" parameterName="p58"/>
<PPCell value="Iran" predictorName="IGNORE_Accounts" parameterName="p59"/>
<PPCell value="Ireland" predictorName="IGNORE_Accounts" parameterName="p60"/>
<PPCell value="Italy" predictorName="IGNORE_Accounts" parameterName="p61"/>
<PPCell value="Jamaica" predictorName="IGNORE_Accounts" parameterName="p62"/>
<PPCell value="Japan" predictorName="IGNORE_Accounts" parameterName="p63"/>
<PPCell value="Malaysia" predictorName="IGNORE_Accounts" parameterName="p64"/>
<PPCell value="Mexico" predictorName="IGNORE_Accounts" parameterName="p65"/>
<PPCell value="NewZealand" predictorName="IGNORE_Accounts" parameterName="p66"/>
<PPCell value="Nicaragua" predictorName="IGNORE_Accounts" parameterName="p67"/>
<PPCell value="Philippines" predictorName="IGNORE_Accounts" parameterName="p68"/>
<PPCell value="Poland" predictorName="IGNORE_Accounts" parameterName="p69"/>
<PPCell value="Portugal" predictorName="IGNORE_Accounts" parameterName="p70"/>
<PPCell value="Scotland" predictorName="IGNORE_Accounts" parameterName="p71"/>
<PPCell value="Singapore" predictorName="IGNORE_Accounts" parameterName="p72"/>
<PPCell value="Taiwan" predictorName="IGNORE_Accounts" parameterName="p73"/>
<PPCell value="UnitedStates" predictorName="IGNORE_Accounts" parameterName="p74"/>
<PPCell value="Vietnam" predictorName="IGNORE_Accounts" parameterName="p75"/>
<PPCell value="Yugoslavia" predictorName="IGNORE_Accounts" parameterName="p76"/>
<PPCell value="1" predictorName="RISK_Adjustment" parameterName="p77"/>
</PPMatrix>
<ParamMatrix>
<PCell parameterName="p0" df="1" beta="-12.0199804097759"/>
<PCell parameterName="p1" df="1" beta="3.62329433275629e-08"/>
<PCell parameterName="p2" df="1" beta="0.0380676635766761"/>
<PCell parameterName="p3" df="1" beta="0.756901134378277"/>
<PCell parameterName="p4" df="1" beta="0.375762595900717"/>
<PCell parameterName="p5" df="1" beta="0.50309824514625"/>
<PCell parameterName="p6" df="1" beta="0.470897191210805"/>
<PCell parameterName="p7" df="1" beta="-2.10284542055317"/>
<PCell parameterName="p8" df="1" beta="-15.5455611068614"/>
<PCell parameterName="p9" df="1" beta="0.0997435072074993"/>
<PCell parameterName="p10" df="1" beta="-1.22905386951777"/>
<PCell parameterName="p11" df="1" beta="-6.76667195830752"/>
<PCell parameterName="p12" df="1" beta="-1.01297363710822"/>
<PCell parameterName="p13" df="1" beta="-0.340407862763258"/>
<PCell parameterName="p14" df="1" beta="-15.8841924243017"/>
<PCell parameterName="p15" df="1" beta="3.18173392385448"/>
<PCell parameterName="p16" df="1" beta="-0.569821531302005"/>
<PCell parameterName="p17" df="1" beta="-3.3033217141108"/>
<PCell parameterName="p18" df="1" beta="-0.430994461878221"/>
<PCell parameterName="p19" df="1" beta="-17.0972305473487"/>
<PCell parameterName="p20" df="1" beta="-15.929168040244"/>
<PCell parameterName="p21" df="1" beta="-17.7483980280451"/>
<PCell parameterName="p22" df="1" beta="-16.1514804898207"/>
<PCell parameterName="p23" df="1" beta="-10.3889654044557"/>
<PCell parameterName="p24" df="1" beta="-0.690592385956069"/>
<PCell parameterName="p25" df="1" beta="2.53630505787246"/>
<PCell parameterName="p26" df="1" beta="1.41541804527502"/>
<PCell parameterName="p27" df="1" beta="1.49491086815453"/>
<PCell parameterName="p28" df="1" beta="0.174099244312997"/>
<PCell parameterName="p29" df="1" beta="1.01865424623088"/>
<PCell parameterName="p30" df="1" beta="1.73213477081248"/>
<PCell parameterName="p31" df="1" beta="-1.80877402327631"/>
<PCell parameterName="p32" df="1" beta="-12.4454410582178"/>
<PCell parameterName="p33" df="1" beta="-0.417346874910574"/>
<PCell parameterName="p34" df="1" beta="-12.475145396564"/>
<PCell parameterName="p35" df="1" beta="1.45214141089004"/>
<PCell parameterName="p36" df="1" beta="1.64050123149924"/>
<PCell parameterName="p37" df="1" beta="0.134775653612853"/>
<PCell parameterName="p38" df="1" beta="0.948585540443075"/>
<PCell parameterName="p39" df="1" beta="0.144171863863442"/>
<PCell parameterName="p40" df="1" beta="0.789971116324262"/>
<PCell parameterName="p41" df="1" beta="0.842781801750256"/>
<PCell parameterName="p42" df="1" beta="-9.63129083571953e-07"/>
<PCell parameterName="p43" df="1" beta="-0.52313575926474"/>
<PCell parameterName="p44" df="1" beta="0.00125611277933667"/>
<PCell parameterName="p45" df="1" beta="0.0109489183058056"/>
<PCell parameterName="p46" df="1" beta="-2.86790934232277"/>
<PCell parameterName="p47" df="1" beta="-10.4586048958891"/>
<PCell parameterName="p48" df="1" beta="-11.8078344468555"/>
<PCell parameterName="p49" df="1" beta="-8.15369086351991"/>
<PCell parameterName="p50" df="1" beta="-15.1509749621394"/>
<PCell parameterName="p51" df="1" beta="-12.6588234930477"/>
<PCell parameterName="p52" df="1" beta="7.44342418994783"/>
<PCell parameterName="p53" df="1" beta="-8.80415604321149"/>
<PCell parameterName="p54" df="1" beta="-0.909551298634999"/>
<PCell parameterName="p55" df="1" beta="3.21333791872318"/>
<PCell parameterName="p56" df="1" beta="-9.7080063371067"/>
<PCell parameterName="p57" df="1" beta="-9.94640566996892"/>
<PCell parameterName="p58" df="1" beta="-7.34469543656762"/>
<PCell parameterName="p59" df="1" beta="-10.1375079207868"/>
<PCell parameterName="p60" df="1" beta="4.03786237290128"/>
<PCell parameterName="p61" df="1" beta="-9.95289672035589"/>
<PCell parameterName="p62" df="1" beta="-11.2800534550324"/>
<PCell parameterName="p63" df="1" beta="-8.5259456003378"/>
<PCell parameterName="p64" df="1" beta="-11.1183864482514"/>
<PCell parameterName="p65" df="1" beta="-3.17790587178398"/>
<PCell parameterName="p66" df="1" beta="7.62183148791729"/>
<PCell parameterName="p67" df="1" beta="-9.29840834254978"/>
<PCell parameterName="p68" df="1" beta="5.87739404847556"/>
<PCell parameterName="p69" df="1" beta="-11.0988711939497"/>
<PCell parameterName="p70" df="1" beta="-5.78171399043641"/>
<PCell parameterName="p71" df="1" beta="-11.009822161619"/>
<PCell parameterName="p72" df="1" beta="-7.98831399897464"/>
<PCell parameterName="p73" df="1" beta="-14.2857685874083"/>
<PCell parameterName="p74" df="1" beta="4.89065048867447"/>
<PCell parameterName="p75" df="1" beta="-2.21686920486685"/>
<PCell parameterName="p76" df="1" beta="-10.0494769160447"/>
<PCell parameterName="p77" df="1" beta="0.0044395180546043"/>
</ParamMatrix>
</GeneralRegressionModel>
</PMML>
R模型持有的系数如下:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.779e+00 1.108e+04 -0.001 0.999584
ID 3.922e-08 6.187e-08 0.634 0.526164
Age 2.705e-02 1.388e-02 1.949 0.051314 .
EmploymentPrivate 1.087e+00 6.774e-01 1.605 0.108550
EmploymentPSFederal 1.155e+00 1.050e+00 1.101 0.271105
EmploymentPSLocal 1.262e+00 8.811e-01 1.432 0.152036
EmploymentPSState 8.151e-01 1.011e+00 0.806 0.420221
EmploymentSelfEmp 2.217e-01 9.859e-01 0.225 0.822066
EmploymentVolunteer -1.667e+01 1.075e+04 -0.002 0.998764
EducationBachelor 4.297e-01 7.768e-01 0.553 0.580154
EducationCollege -1.234e+00 8.393e-01 -1.470 0.141592
EducationDoctorate 1.604e+00 1.697e+00 0.945 0.344690
EducationHSgrad -5.332e-01 7.613e-01 -0.700 0.483661
EducationMaster -3.705e-01 1.117e+00 -0.332 0.740081
EducationPreschool -1.306e+01 3.588e+03 -0.004 0.997096
EducationProfessional 1.600e+00 1.251e+00 1.279 0.200733
EducationVocational -3.887e-01 1.023e+00 -0.380 0.703998
EducationYr10 -2.121e+00 1.897e+00 -1.118 0.263626
EducationYr11 -3.222e-01 1.294e+00 -0.249 0.803322
EducationYr12 -4.786e+00 1.235e+01 -0.388 0.698298
EducationYr1t4 -1.588e+01 4.174e+03 -0.004 0.996965
EducationYr5t6 -1.779e+01 2.356e+03 -0.008 0.993976
EducationYr7t8 -1.659e+01 1.951e+03 -0.009 0.993214
EducationYr9 -1.672e+01 2.680e+03 -0.006 0.995022
MaritalDivorced -6.700e-01 8.277e-01 -0.809 0.418238
MaritalMarried 2.269e+00 5.238e-01 4.332 1.48e-05 ***
MaritalMarried-spouse-absent 1.299e+00 1.385e+00 0.938 0.348362
MaritalUnmarried 1.570e+00 9.025e-01 1.740 0.081926 .
MaritalWidowed 7.018e-01 1.209e+00 0.581 0.561438
OccupationClerical 1.060e+00 1.224e+00 0.866 0.386731
OccupationExecutive 1.851e+00 1.138e+00 1.627 0.103649
OccupationFarming 1.189e-01 1.530e+00 0.078 0.938065
OccupationHome -1.296e+01 6.601e+03 -0.002 0.998434
OccupationMachinist 2.869e-01 1.299e+00 0.221 0.825190
OccupationMilitary -1.318e+01 1.075e+04 -0.001 0.999022
OccupationProfessional 1.589e+00 1.187e+00 1.339 0.180656
OccupationProtective 1.099e+00 1.622e+00 0.678 0.497935
OccupationRepair 1.641e-01 1.204e+00 0.136 0.891597
OccupationSales 7.170e-01 1.205e+00 0.595 0.551929
OccupationService -5.600e-02 1.348e+00 -0.042 0.966858
OccupationSupport 8.431e-01 1.348e+00 0.626 0.531515
OccupationTransport 3.488e-01 1.242e+00 0.281 0.778911
Income 1.442e-06 3.112e-06 0.463 0.643050
GenderMale 1.510e-01 5.361e-01 0.282 0.778254
Deductions 1.476e-03 4.109e-04 3.593 0.000327 ***
Hours 2.116e-02 1.433e-02 1.476 0.139922
IGNORE_AccountsChina -2.048e+01 1.867e+04 -0.001 0.999125
IGNORE_AccountsColumbia -2.085e+01 1.294e+04 -0.002 0.998715
IGNORE_AccountsCuba -1.942e+01 1.544e+04 -0.001 0.998997
IGNORE_AccountsEcuador -1.701e+01 1.544e+04 -0.001 0.999121
IGNORE_AccountsEngland -1.418e+01 1.109e+04 -0.001 0.998980
IGNORE_AccountsGermany -4.952e-02 1.108e+04 0.000 0.999996
IGNORE_AccountsGreece -1.645e+01 1.544e+04 -0.001 0.999150
IGNORE_AccountsGuatemala -2.767e+00 1.459e+04 0.000 0.999849
IGNORE_AccountsHong -3.325e+00 1.557e+04 0.000 0.999830
IGNORE_AccountsIndia -1.506e+01 1.110e+04 -0.001 0.998918
IGNORE_AccountsIndonesia -1.692e+01 1.225e+04 -0.001 0.998897
IGNORE_AccountsIreland -3.329e+00 1.108e+04 0.000 0.999760
IGNORE_AccountsItaly -1.663e+01 1.304e+04 -0.001 0.998982
IGNORE_AccountsJamaica -2.174e+01 2.163e+04 -0.001 0.999198
IGNORE_AccountsJapan -1.577e+01 1.544e+04 -0.001 0.999185
IGNORE_AccountsMalaysia -1.903e+01 1.206e+04 -0.002 0.998741
IGNORE_AccountsMexico -9.440e+00 1.108e+04 -0.001 0.999320
IGNORE_AccountsNewZealand 1.773e-01 1.562e+04 0.000 0.999991
IGNORE_AccountsNicaragua -1.786e+01 1.200e+04 -0.001 0.998812
IGNORE_AccountsPhilippines -9.526e-01 1.108e+04 0.000 0.999931
IGNORE_AccountsPoland -1.878e+01 1.544e+04 -0.001 0.999030
IGNORE_AccountsPortugal -1.432e+00 1.557e+04 0.000 0.999927
IGNORE_AccountsSingapore -1.778e+01 1.225e+04 -0.001 0.998842
IGNORE_AccountsTaiwan -1.922e+01 1.259e+04 -0.002 0.998782
IGNORE_AccountsUnitedStates -2.519e+00 1.108e+04 0.000 0.999819
IGNORE_AccountsVietnam -1.984e+01 1.250e+04 -0.002 0.998734
IGNORE_AccountsYugoslavia -1.774e+01 1.544e+04 -0.001 0.999083
RISK_Adjustment 3.802e-03 6.819e-04 5.575 2.47e-08 ***
(生成此 GLM 模型和相应 PMML 的 R 脚本如下:
library(pmml)
auditDF <- read.csv("http://rattle.togaware.com/audit.csv")
auditDF <- na.omit(auditDF)
target <- auditDF$TARGET_Adjusted
N <- length(target); M <- N - 500
i.train <- sample(N, M)
audit.train <- auditDF[i.train,]
audit.test <- auditDF[-i.train,]
glm.model <- glm(audit.train$TARGET_Adjusted ~ ., data = audit.train, family = "binomial")
glm.pmml <- pmml(glm.model, name = "glm model", data = trainDF)
xmlFile <- file.path(getwd(), "audit-glm.xml")
saveXML(glm.pmml, xmlFile)
资料来源:http://blog.revolutionanalytics.com/2011/03/predicting-r-models-with-pmml.html)
最佳答案
我想这完全取决于您将模型放回 R 后要对它做什么。有一次,我帮助某人创建了一个伪 gml 对象,该对象知道变量的系数并且可以与 predict()
一起使用。许多其他功能需要存在填充数据集。
如果这可能对您感兴趣。该函数称为 makeglm.R 。您将只想将该函数复制并粘贴到 R session 中。但是有必要首先转换您的数据。这里有一些辅助函数可以做到这一点。
getdata <- function(xml, ns=attr(xml,"ns")) {
names<-xpathSApply(xml, "//d:DataField/@name", namespaces = ns)
vals<-xpathApply(xml, "//d:DataField", function(x) {
if(xmlGetAttr(x, "optype")=="categorical") {
levels<-xpathSApply(x, "Value/@value")
factor(character(0), levels=levels)
} else if (xmlGetAttr(x, "optype")=="continuous"){
numeric(0)
}
}, namespaces = ns)
names(vals)<-names
as.data.frame(vals)
}
getformula <- function(xml, ns=attr(xml,"ns")) {
resp<-xpathSApply(xml, "//d:MiningField[@usageType=\"predicted\"]/@name",
namespaces = ns)
covar<-xpathSApply(xml, "//d:MiningField[@usageType=\"active\"]/@name",
namespaces = ns)
fmc<-paste(paste(resp, collapse=" + "), "~", paste(covar, collapse=" + "))
as.formula(fmc)
}
getestimates <- function(xml, ns=attr(xml,"ns")) {
betas <- setNames(as.numeric(xpathSApply(xml, "//d:PCell/@beta", namespaces = ns)),
xpathSApply(xml, "//d:PCell/@parameterName", namespaces = ns))
numericparam <- unname(xpathSApply(xml, "//d:CovariateList/d:Predictor/@name", namespaces = ns))
factorparam <- unname(xpathSApply(xml, "//d:FactorList/d:Predictor/@name", namespaces = ns))
values <- do.call(rbind, Map(function(x,y,z) data.frame(p=x, val=y, pred=z, stringsAsFactors=F),
unname(xpathSApply(xml,"//d:PPCell/@parameterName", namespaces = ns)),
xpathSApply(xml, "//d:PPCell/@value", namespaces = ns),
xpathSApply(xml, "//d:PPCell/@predictorName", namespaces = ns)))
lf<-Map(function(x) {
vv <- values[values$pred==x, ]
setNames(betas[vv$p], vv$val)
}, factorparam)
ln<-Map(function(x) {
vv <- values[values$pred==x, ]
unname(betas[vv$p])
}, numericparam)
estimates<-c(lf,ln)
intercept<-getNodeSet(xml,"//d:Parameter[@label=\"(Intercept)\"]", namespaces = ns)
if(length(intercept)) {
estimates<-c(unname(betas[xmlGetAttr(intercept[[1]],"name")]), estimates)
}
estimates
}
我一点也不熟悉 PMML 格式,但我根据您的示例文档将它们放在一起。我尝试从数据中提取构建公式、data.frame stub 和参数估计所需的所有正确信息,以便使用
makeglm()
函数。一旦你加载了这个函数和这些辅助函数,你就可以运行library(XML)
mypmml <- xmlParse("pmml.xml")
attr(mypmml, "ns")<-"d"
dd <- getdata(mypmml)
ff <- getformula(mypmml)
ee <- getestimates(mypmml)
do.call(makeglm, c(list(ff, family="binomial", data=dd), ee))
实际运行该功能。这将返回一个
glm
对象,您可以将其与 predict()
一起使用。我确实必须在您的示例数据中更改一件事。出于某种原因,您将表名作为 glm 模型中公式的一部分glm(audit.train$TARGET_Adjusted ~ ., data = audit.train, ...)
而不是
glm(TARGET_Adjusted ~ ., data = audit.train, ...)
这可能会导致问题。所以在我读入它之前,我只是从 xml 文件中取出了“audit.train$”。也许可以进行更多的错误检查,但我什至不确定这是否最终是你所追求的。
关于r - 将表示多项 Logistic 回归的 PMML 转换回 R 系数,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/24374503/