本文介绍了R:如何展开包含“列表”的行到几行...每个列表成员一个?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确定有一个简单的解决方案,但我正在尝试找到它。任何帮助都非常感激。



我有一个包含2列的数据框; 亲和pep。
pro被格式化为因素,并包含220; 300; 4的条目;有时更多的数字(由;分隔)),有时只是一个数字(而不是;)。
pep列格式化为整数,并包含单个数字,例如20.
我想做的是扩展行pro:220; 300; 4和pep:20
到三行一,pro:220和pep:20,一个与pro:300和pep:20,一个与pro:4和pep:20。



我想为整个数据帧执行此操作,从而最终得到一个包含两个字符格式的列的数据帧,其中所有行最初都包含多个;分隔的数字已被扩展。



我希望避免循环,因为数据框架相当大(> 100000行)


$ b $我很抱歉,我没有能够以更具代表性的方式发布这个...我是新来的,并以代码格式丢失。



在来自西蒙的非常赞赏的请求中:

 > dput(head(dat,10))
structure(list(Protein.Group.IDs = structure(c(1095L,60L,299L,
242L,1091L,147L,161L,884L,783L,1040L ),.Label = c(0,
1,10,100,101,102,103,104 ,107,
108,109,11,110,111,112,113,114 b115,116,117,118,119,12,120,121,121; 920; 530,
530; 589,121; 920; 530; 589; 934,121; 920; 589,121; 920; 934,
122; 351,122; 351; 122; 351; 950; 224; 904,122; 351; 950; 687,
122; 901; 224; 904,122; 901; 351 351; 950,122; 901; 351; 950; 224,
122; 901; 351; 950; 224; 890; 904,122; 901; 351; 950; 224; 890; 904; 687,
122; 901; 351; 950; 890; 687,122; 901; 950,122; 901; 950; 904; 687,
950,123,124,125,126,127,127,952,128,
129,13,130 131,131,204,132,133,134,135,
136,137,138,139,14 140,140; 259; 436,141,
142,143,144,1 45,146,147,148,149,15,
150,151,152,153,154 ,156,157,158,
159,16,16; 331,16; 331; 329,16; 16; 331; 329; 910,
16; 331; 329; 910; 62,16; 331; 62,16; 331; 910,160,161 b $ b162,163,164,165,166,166; 743,167,167; 595,
168 ,17,170,170,48,171,172,173,174,
175,176,177 ,179,18,180,181,182,
183,184,185,186,187 188; 813,188; 813; 852,
189,19,19; 14,19; 6; 9; 14; 11,19; 884; 9; 14; 20; 26; 11; 1,
19; 9,19; 9; 14,190,190; 260,191,192 193,194,
195,196,197,198,199,2,20,20; 26,200 b $ b201,202,203,204,205,206,207,208,209,
21 4,210,211,212,213,214,215,216,
217 ,220,221,222,223,224,
224; 890,224; 890; 904,225 1,225; 221; 308,225; 295,
226,227,228,228; 396,228; 396; 73 73,229,
23,23; 137,23; 17; 137,230,231,232,233 b $ b235,236,237,238,239,24,240,241,242,
243,244,245,246,247,248,249,25,
250,251,252 ,256,255,256,257,258,
259,26,260,261 264,265,266,
267,268,269,27,270,271,272,273 541; 905,
273; 905,274,275,276,277,278,279,28,280,
281,281,192,282,283,284,285,286,287,288,
289 ,290,291,292,293,294,295,296,
297,298,299,3 30,300,301,302,303,304,
304,770,305,306,307,308 309,31,310,311,
312,313; 293,314,314; 658,315,316 ,318,
319,32,320,321,322,323,32 4,324; 34; 564; 637; 282; 229; 565,
324; 564; 282,324; 637; 229; 565,325,326 327,328,
328; 586,329,33,330,331,332,333,334,335,
336,337,338,339,34,340,341,342,343,
344 ,346,346,523,347,348,349,35,350,
351 353,353,277,354,355,356,
357,358,359,36,360 362,363,364,
365,366,367,368,369,37,370,371 ,
373,374,375,376,377,377; 938,378,379,38,
,381,382,382; 147,383,384,385,386,387,
388,389 ,39; 417,390,391,392,393,394,
395,396,397,398 ,399; 955,4,40,400,
401,402,403,404,405,406 ,408,409,
41,410,411,412,413,414,415,416,417 b $ b418,419,42,420,421,422,423 4,424,640,b $ b425,426,427,427; 930,428,429,43,430 ,
432,433,434,435,436,437,438,438; 178,439,
,440,441,442,443,444,445,446,447,
448,449,45 450,451,452,453,454,455,
456,457,458,459,46 ,461,462,463,
464,465,466,467,468,469,47 471,b7,47,478,477,471,
472,473,474,475,476 ,481,482,483,484,485,486,487,488,
488; 648,489 ,490,491,492,493,494,495,
496,497,498,499,5 50,500,501,502,503,
504,505,506,507,508,509 ,510,511,
512,513,514,515,516,516; 603; 845,516; 603; 845; 837 ,
517,518,519,52,520,521,522,523,524,
526,527,527; 509,528,529,53,530,531,
532,533,534,535,536,537,538,539,54,
541,542,543,544,545,546,547,
548,549,55 550,549,551,552,553,554,
555,556,557,558,559 560,561,562,
563,564,564; 282,564; 637,565,566,567 ,
568; 569,568; 569; 286,568; 569; 574,568; 569; 574; 286,568; 574,
57,573,574,575,576,
577,578,579 579; 577; 578,579; 577; 580,579; 577; 580; 578,
58,580,581,582 584,585,585; 609,586,
587,587; 167,587; 167; 595,587; 167; 595; 557 588,589,
59,590,591,592,593,594,595,596 $ b598,599,6,60,600,601,601; 10,602,603,
604 606,607,608,609,61,610,611,
612,613,614 615; 269,615; 926; 269,616,
617,61 62,627,628,62,62,62 ,623,636,635,636,637,63 ,638,639,64,
64; 72,640,641,642,643,643; 529,644 645,
646,647,648,649,65,650,651,652,653 ,655,656,657,658,659,66,660,661,
662,663 ,664,665,666,667,668,669,
67,670,671,672 674,675,676,677,
678,679,68,680,681,681; 97,682 683,689,689,689,689,689,698 b693,694,695,696,697,698,699,7,7; 25; 5,
7; ,7,752; 24,7; 752; 25; 24; 8,70,700,701,702,
703,704 705,706,707,708,709,71,710,
711,712,713,714 ,716,717,718,719,
72,72; 746; 944,72; 746; 944; 772 ; 772,72,727,728,727,727,727 729,
73,730,731,732,733,734,735,735; 522,735; 665 $ b735; 665; 522,735; 665; 876,735; 876,735; 876; 522,736,
737,738 74,746,746,742,743,744,
745,746 ,747,748,749,
75,750,751,752,752; 24,753,754 756,
757,758,759,76,76; 313,76; 313; 293,760,761,
762,763,764,765,766,767,768,769,77,
770 772,773,774,775,776,777,778,
779,78,780,781 783,784,785,786,
787,788,789,79 792,793,792,793,
793,863,794,795,796,797 ,
800,801,802,803,804,805,806,807,808,
,809,81,810,811,812,813,814,815 ,
815; 413,815; 777,815; 777; 339,815; 777; 838,815; 838,
816,817 ,818,818; 7; 752,818; 7; 752; 23; 25; 17; 8,819,
82,820,821 822,823,824,824,823,824,824 838,832,833,832,833,834,
835,836,837,838 ,842,
843,844,845,846,847,847; 560; 590,848,849,
852,853,853,420,854,
855,856,857 ,858,858; 638,858; 638; 409,859,
86,860,861,861; 593,862 863,864,865,871,863,864,865,866 872,873,873,
874,875,876,877,878,879,88 b882,883,884,884; 6,884; 6; 9,885,886,887,
888 189,889,89,890,890; 904,891,891; 953,
892,892; 941,893 894\" , 895, 896, 897, 898, 899,
9, 90, 900,901,901,224,902,903,904,905,
906,907,908 91,910,911,912,913,
914,915,916,917 922,921,922,923,924,924,924 ,925,
926,927,928,929,93,930,931,932,933,
934,938,936,937,938,939,94,940,941,
,945,946,947,948,949,95,
950,951,952,953 955,956,957,958,
959,96,960,961,962,963,964 ,966,
967,97,98,99,99; 392),class =factor),Mod..Peptide.ID = c(23L,
24L,25L,26L,27L,29L,30L,31L,32L,33L)),Names = c(Protein.Group.IDs,
Mod..Peptide.ID) ,row.names = c(318L,344L,380L,406L,409L,
417L,436L,462L,494L,505L),class =data.frame)

此致
的Mads

决方案

我已经长大了的真正的爱 data.table 对于这样的任务。这是非常简单的但是,首先,我们来做一些样例数据(你应该提供理想的)。

 #样本数据
set.seed (1)
df = data.frame(pep = replicate(3,paste(sample(999,3),collapse =;)),pro = sample(3),stringsAsFactors = FALSE)
'/ code>

现在我们使用 data.table 包做重塑在几行...

 #负载data.table包
需要(data.table)

#打开data.frame成data.table,它看起来像..
DT< - data.table(DF)
#PEP亲
#1: 266; 372; 572 1
#2:908; 202; 896 3
#3:944; 660; 628 2

# 。
dt [,list(pep = unlist(strsplit(pep,;))),by = pro]
#pro pep
#1:1 266
# 2:1 372
#3:1 572
#4:3 908
#5:3 202
#6:3 896
#7:2 944
#8:2 660
#9:2 628


I am sure there is a simple solution to this, but i am going nuts trying to find it. Any help is very much appreciated.

I have a data frame with 2 columns; "pro" and "pep".pro is formatted as factors and contains entries in the form 220;300;4 sometimes more numbers (seperated by ";") and sometimes just a single number (and no ";").The pep column is formatted as integers and contains single numbers, e.g. 20.What i would like to do is to "expand" e.g. the row pro: 220;300;4 and pep: 20to three rows one with pro: 220 and pep: 20, one with pro: 300 and pep: 20 and one with pro: 4 and pep: 20.

I want to do this for the whole data frame and thus end up with a data frame with two character formatted columns where all the rows originally containing multiple ";" seperated numbers have been expanded.

I would prefer to avoid loops since the data frame is fairly large (>100000 rows)

I am sorry that i havent been able to post this in a more case-representative way...i am new here and got lost in the code format.

On a much appreciated request from simon:

    > dput( head( dat , 10 ) )
structure(list(Protein.Group.IDs = structure(c(1095L, 60L, 299L,
242L, 1091L, 147L, 161L, 884L, 783L, 1040L), .Label = c("0",
"1", "10", "100", "101", "102", "103", "104", "105", "106", "107",
"108", "109", "11", "110", "111", "112", "113", "114", "114;680",
"115", "116", "117", "118", "119", "12", "120", "121", "121;920;530",
"121;920;530;589", "121;920;530;589;934", "121;920;589", "121;920;934",
"122;351", "122;351;950", "122;351;950;224;904", "122;351;950;687",
"122;901;224;904", "122;901;351", "122;901;351;950", "122;901;351;950;224",
"122;901;351;950;224;890;904", "122;901;351;950;224;890;904;687",
"122;901;351;950;890;687", "122;901;950", "122;901;950;904;687",
"122;950", "123", "124", "125", "126", "127", "127;952", "128",
"129", "13", "130", "131", "131;204", "132", "133", "134", "135",
"136", "137", "138", "139", "14", "140", "140;259;436", "141",
"142", "143", "144", "145", "146", "147", "148", "149", "15",
"150", "151", "152", "153", "154", "155", "156", "157", "158",
"159", "16", "16;331", "16;331;329", "16;331;329;62", "16;331;329;910",
"16;331;329;910;62", "16;331;62", "16;331;910", "160", "161",
"162", "163", "164", "165", "166", "166;743", "167", "167;595",
"168", "169", "17", "170", "170;48", "171", "172", "173", "174",
"175", "176", "177", "178", "179", "18", "180", "181", "182",
"183", "184", "185", "186", "187", "188", "188;813", "188;813;852",
"189", "19", "19;14", "19;6;9;14;11", "19;884;6;9;14;20;26;11;1",
"19;9", "19;9;14", "190", "190;260", "191", "192", "193", "194",
"195", "196", "197", "198", "199", "2", "20", "20;26", "200",
"201", "202", "203", "204", "205", "206", "207", "208", "209",
"21", "21;4", "210", "211", "212", "213", "214", "215", "216",
"217", "218", "219", "22", "220", "221", "222", "223", "224",
"224;890", "224;890;904", "225", "225;221", "225;221;308", "225;295",
"226", "227", "228", "228;396", "228;396;73", "228;73", "229",
"23", "23;137", "23;17;137", "230", "231", "232", "233", "234",
"235", "236", "237", "238", "239", "24", "240", "241", "242",
"242;171", "243", "244", "245", "246", "247", "248", "249", "25",
"250", "251", "252", "253", "254", "255", "256", "257", "258",
"259", "26", "260", "261", "262", "263", "264", "265", "266",
"267", "268", "269", "27", "270", "271", "272", "273", "273;541;905",
"273;905", "274", "275", "276", "277", "278", "279", "28", "280",
"281", "281;192", "282", "283", "284", "285", "286", "287", "288",
"289", "29", "290", "291", "292", "293", "294", "295", "296",
"297", "298", "299", "3", "30", "300", "301", "302", "303", "304",
"304;770", "305", "306", "307", "308", "309", "31", "310", "311",
"312", "313;293", "314", "314;658", "315", "316", "317", "318",
"319", "32", "320", "321", "322", "323", "324", "324;34;564;637;282;229;565",
"324;564;282", "324;637;229;565", "325", "326", "327", "328",
"328;586", "329", "33", "330", "331", "332", "333", "334", "335",
"336", "337", "338", "339", "34", "340", "341", "342", "343",
"344", "345", "346", "346;523", "347", "348", "349", "35", "350",
"351", "351;890", "352", "353", "353;277", "354", "355", "356",
"357", "358", "359", "36", "360", "361", "362", "363", "364",
"365", "366", "367", "368", "369", "37", "370", "371", "372",
"373", "374", "375", "376", "377", "377;938", "378", "379", "38",
"380", "381", "382", "382;147", "383", "384", "385", "386", "387",
"388", "389", "39", "39;417", "390", "391", "392", "393", "394",
"395", "396", "397", "398", "399", "399;955", "4", "40", "400",
"401", "402", "403", "404", "405", "406", "407", "408", "409",
"41", "410", "411", "412", "413", "414", "415", "416", "417",
"418", "419", "42", "420", "421", "422", "423", "424", "424;640",
"425", "426", "427", "427;930", "428", "429", "43", "430", "431",
"432", "433", "434", "435", "436", "437", "438", "438;178", "439",
"44", "440", "441", "442", "443", "444", "445", "446", "447",
"448", "449", "45", "450", "451", "452", "453", "454", "455",
"456", "457", "458", "459", "46", "460", "461", "462", "463",
"464", "465", "466", "467", "468", "469", "47", "470", "471",
"472", "473", "474", "475", "476", "477", "478", "479", "48",
"480", "481", "482", "483", "484", "485", "486", "487", "488",
"488;648", "489", "49", "490", "491", "492", "493", "494", "495",
"496", "497", "498", "499", "5", "50", "500", "501", "502", "503",
"504", "505", "506", "507", "508", "509", "51", "510", "511",
"512", "513", "514", "515", "516", "516;603;845", "516;603;845;837",
"517", "518", "519", "52", "520", "521", "522", "523", "524",
"525", "526", "527", "527;509", "528", "529", "53", "530", "531",
"532", "533", "534", "535", "536", "537", "538", "539", "54",
"540", "540;67", "541", "542", "543", "544", "545", "546", "547",
"548", "549", "55", "550", "550;549", "551", "552", "553", "554",
"555", "556", "557", "558", "559", "56", "560", "561", "562",
"563", "564", "564;282", "564;637", "565", "566", "567", "568",
"568;569", "568;569;286", "568;569;574", "568;569;574;286", "568;574",
"569", "57", "570", "571", "572", "573", "574", "575", "576",
"577", "578", "579", "579;577;578", "579;577;580", "579;577;580;578",
"58", "580", "581", "582", "583", "584", "585", "585;609", "586",
"587", "587;167", "587;167;595", "587;167;595;557", "588", "589",
"59", "590", "591", "592", "593", "594", "595", "596", "597",
"598", "599", "6", "60", "600", "601", "601;10", "602", "603",
"604", "605", "606", "607", "608", "609", "61", "610", "611",
"612", "613", "614", "615", "615;269", "615;926;269", "616",
"617", "618", "619", "62", "620", "621", "622", "623", "624",
"625", "626", "627", "628", "629", "63", "63;397", "630", "631",
"632", "633", "634", "635", "636", "637", "638", "639", "64",
"64;72", "640", "641", "642", "643", "643;529", "644", "645",
"646", "647", "648", "649", "65", "650", "651", "652", "653",
"654", "655", "656", "657", "658", "659", "66", "660", "661",
"662", "663", "663;819", "664", "665", "666", "667", "668", "669",
"67", "670", "671", "672", "673", "674", "675", "676", "677",
"678", "679", "68", "680", "681", "681;97", "682", "683", "684",
"685", "686", "687", "688", "689", "69", "690", "691", "692",
"693", "694", "695", "696", "697", "698", "699", "7", "7;25;5",
"7;752", "7;752;24", "7;752;25;24;8", "70", "700", "701", "702",
"703", "704", "705", "706", "707", "708", "709", "71", "710",
"711", "712", "713", "714", "715", "716", "717", "718", "719",
"72", "72;746;944", "72;746;944;772", "72;772", "72;927", "720",
"721", "722", "723", "724", "725", "726", "727", "728", "729",
"73", "730", "731", "732", "733", "734", "735", "735;522", "735;665",
"735;665;522", "735;665;876", "735;876", "735;876;522", "736",
"737", "738", "739", "74", "740", "741", "742", "743", "744",
"745", "746", "746;944", "746;944;772", "747", "748", "749",
"75", "750", "751", "752", "752;24", "753", "754", "755", "756",
"757", "758", "759", "76", "76;313", "76;313;293", "760", "761",
"762", "763", "764", "765", "766", "767", "768", "769", "77",
"770", "771", "772", "773", "774", "775", "776", "777", "778",
"779", "78", "780", "781", "782", "783", "784", "785", "786",
"787", "788", "789", "79", "790", "790;552", "791", "792", "793",
"793;863", "794", "795", "796", "797", "798", "799", "8", "80",
"800", "801", "802", "803", "804", "805", "806", "807", "808",
"808;21", "809", "81", "810", "811", "812", "813", "814", "815",
"815;413", "815;777", "815;777;339", "815;777;838", "815;838",
"816", "817", "818", "818;7;752", "818;7;752;23;25;17;8", "819",
"82", "820", "821", "822", "823", "824", "824;957", "825", "826",
"827", "828", "829", "83", "830", "831", "832", "833", "834",
"835", "836", "837", "838", "839", "84", "840", "841", "842",
"843", "844", "845", "846", "847", "847;560;590", "848", "849",
"85", "850", "850;817", "851", "852", "853", "853;420", "854",
"855", "856", "857", "858", "858;638", "858;638;409", "859",
"86", "860", "861", "861;593", "862", "863", "864", "865", "866",
"867", "868", "869", "869;614", "87", "870", "871", "872", "873",
"874", "875", "876", "877", "878", "879", "88", "880", "881",
"882", "883", "884", "884;6", "884;6;9", "885", "886", "887",
"888", "888;189", "889", "89", "890", "890;904", "891", "891;953",
"892", "892;941", "893", "894", "895", "896", "897", "898", "899",
"9", "90", "900", "901", "901;224", "902", "903", "904", "905",
"906", "907", "908", "909", "91", "910", "911", "912", "913",
"914", "915", "916", "917", "918", "918;947", "919", "92", "920;530;589",
"920;530;589;934", "921", "922", "923", "924", "924;576", "925",
"926", "927", "928", "929", "93", "930", "931", "932", "933",
"934", "935", "936", "937", "938", "939", "94", "940", "941",
"942", "943", "944", "945", "946", "947", "948", "949", "95",
"950", "951", "952", "953", "954", "955", "956", "957", "958",
"959", "96", "960", "961", "962", "963", "964", "965", "966",
"967", "97", "98", "99", "99;392"), class = "factor"), Mod..Peptide.ID = c(23L,
24L, 25L, 26L, 27L, 29L, 30L, 31L, 32L, 33L)), .Names = c("Protein.Group.IDs",
"Mod..Peptide.ID"), row.names = c(318L, 344L, 380L, 406L, 409L,
417L, 436L, 462L, 494L, 505L), class = "data.frame")

Kind RegardsMads

解决方案

I've grown to really love data.table for this kind of task. It is so very simple. But first, let's make some sample data (which you should provide idealy!)

#  Sample data
set.seed(1)
df = data.frame( pep = replicate( 3 , paste( sample(999,3) , collapse=";") ) , pro = sample(3) , stringsAsFactors = FALSE )

Now we use the data.table package to do the reshaping in a couple of lines...

#  Load data.table package
require(data.table)

#  Turn data.frame into data.table, which looks like..
dt <- data.table(df)
#           pep pro
#1: 266;372;572   1
#2: 908;202;896   3
#3: 944;660;628   2

# Transform it in one line like this...
dt[ , list( pep = unlist( strsplit( pep , ";" ) ) ) , by = pro ]
#   pro pep
#1:   1 266
#2:   1 372
#3:   1 572
#4:   3 908
#5:   3 202
#6:   3 896
#7:   2 944
#8:   2 660
#9:   2 628

这篇关于R:如何展开包含“列表”的行到几行...每个列表成员一个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 06:52