问题描述
我有一个大的制表符分隔的文件,具有重复的产品,但颜色和数量不同。我试图合并基于键的数据,以便我最终得到一个产品,组合的颜色和数量由分隔符(在这种情况下,逗号)分隔。
I have a big tab-separated file with duplicate products but with different colours and amounts. I’m trying to merge the data based on the key so that I end up with one product and the combined colours and amounts separated by a delimiter (comma in this case).
我使用模块,以便我有更好的控制,并且因为它允许我输出具有不同分隔符(从分号到管道)的文件。
I'm using the Text::CSV
module so that I have better control, and because it allows me to output the file with a different delimiters (from semicolon to pipe).
我的问题是,如何正确合并数据?我不想只是简单地结合颜色和数量,但删除重复的值。所以我想用 Id
/ 金额
和 Id
/ 颜色
。但 Id
不是唯一的,那么我该如何做呢?
My question is, how do I merge the data properly? I don't want it simply to combine colours and amounts but remove duplicate values as well. So I was thinking a key/value with the Id
/Amount
and Id
/Colour
. But Id
isn't unique so how do I do this? Do I create an array or use hashes?
这里是一些示例源数据,其中的选项卡分隔符由分号替换;
。请注意,标记的行没有Color,因此在结果中不会合并空值。
Here is some sample source data, with the tab separators replaced by semicolons ;
. Note that the marked row has no Colour so the empty value is not combined in the result.
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;
101;Fruits;50020;Strawberry;500;Red;1;
101;Fruits;50020;Strawberry;1000;Red;1;
201;Vegetables;60090;Tomato;50;Green;1;
201;Vegetables;60080;Onion;1;Purple;1;
201;Vegetables;60090;Tomato;100;Red;1;
201;Vegetables;60010;Carrot;100;Purple;1;
201;Vegetables;60050;Broccoli;500;Green;1;
201;Vegetables;60050;Broccoli;1000;Green;1;
201;Vegetables;60090;Tomato;500;Yellow;1;
101;Fruits;50060;Apple;500;Green;1;
101;Fruits;50010;Grape;500;Red;1;
201;Vegetables;60010;Carrot;500;White;1;
201;Vegetables;60050;Broccoli;2000;Green;1;
201;Vegetables;60090;Tomato;1000;Red;1;
101;Fruits;50020;Strawberry;100;Red;1;
101;Fruits;50060;Apple;1000;Red;1;
201;Vegetables;60010;Carrot;250;Yellow;1;
101;Fruits;50010;Grape;100;White;1;
101;Fruits;50030;Banana;500;Yellow;1;
201;Vegetables;60010;Carrot;1000;Yellow;1;
101;Fruits;50030;Banana;1000;Green;1;
101;Fruits;50020;Strawberry;200;Red;1;
101;Fruits;50010;Grape;200;White;1;
201;Vegetables;60010;Carrot;50;Orange;1;
201;Vegetables;60080;Onion;2;White;1;
我想要的结果:
101;Fruits;50010;Grape;100,500,200;Red,White;1;
201;Vegetables;60090;Tomato;50,500,1000,10;Yellow,Green,Red;1;
101;Fruits;50060;Apple;500,1000;Red,Green;1;
201;Vegetables;60010;Carrot;250,50,500,1000,100;Orange,Yellow,White,Purple;1;
201;Vegetables;60050;Broccoli;1000,500,2000;Green;1;
101;Fruits;50020;Strawberry;100,1000,200,500;Red;1;
101;Fruits;50030;Banana;500,1000;Yellow,Green;1;
201;Vegetables;60080;Onion;2,1;White,Purple;1;
这是我的脚本。它没有完成(而不是工作),因为我不知道如何继续。我不认为这可以正常工作,因为我试图使用相同的键为不同的颜色。
This is my script so far. It's not finished (and not working) because I'm not sure how to continue. I don't think this can work right because I'm trying to use the same key for different colours.
use strict;
use warnings;
use Text::CSV;
use List::MoreUtils 'uniq';
my $inputfile = shift || die "Give input and output names!\n";
my $outputfile = shift || die "Give output name!\n";
open my $infile, '<', $inputfile or die "Sourcefile in use / not found :$!\n";
open my $outfile, '>', $outputfile or die "Outputfile in use :$!\n";
binmode($outfile, ":encoding(utf8)");
my $csv_in = Text::CSV->new({binary => 1,sep_char => ";",eol => $/});
my $csv_out = Text::CSV->new({binary => 1,sep_char => "|",always_quote => 1,eol => $/}); #,quote_null => 0 #
my %data;
while (my $elements = $csv_in->getline($infile)){
my $id = $elements->[2];
push @{ $data{$id} }, \@elements;
}
for my $id ( sort keys %data ){
my $set = $data{$id};
my @elements = @{ $set->[0] };
$elements[4] = join ',', uniq map { $_->[4] } @$set;
$elements[5] = join ',', uniq map { $_->[5] } @$set;
$csv_in->combine(@$elements);
$csv_out->print($outfile, $elements);
}
编辑:我使用data :: dumper进行测试
推荐答案
我没有时间写正确的评论,这个程序似乎做你所需要的。它使用模块。它不是一个核心模块,因此可能需要安装。我相信在合并字段中显示的金额和颜色是什么顺序?
I don't have time to write a proper commentary, but this program seems to do what you need. It uses the uniq
function from the List::MoreUtils
modules. It isn't a core module and so may need installing. I trust that it's not important what order the Amounts and Colours appear in the combined fields?
use strict;
use warnings;
use List::MoreUtils 'uniq';
print scalar <DATA>;
my %data;
while (<DATA>) {
chomp;
my @fields = split /;/;
my $id = $fields[2];
push @{ $data{$id} }, \@fields;
}
for my $id ( sort keys %data ) {
my $set = $data{$id};
my @fields = @{ $set->[0] };
$fields[4] = join ',', uniq map { $_->[4] } @$set;
$fields[5] = join ',', uniq map { $_->[5] } @$set;
print join(';', @fields, ''), "\n";
}
__DATA__
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;
101;Fruits;50020;Strawberry;500;Red;1;
101;Fruits;50020;Strawberry;1000;Red;1;
201;Vegetables;60090;Tomato;50;Green;1;
201;Vegetables;60080;Onion;1;Purple;1;
201;Vegetables;60090;Tomato;100;Red;1;
201;Vegetables;60010;Carrot;100;Purple;1;
201;Vegetables;60050;Broccoli;500;Green;1;
201;Vegetables;60050;Broccoli;1000;Green;1;
201;Vegetables;60090;Tomato;500;Yellow;1;
101;Fruits;50060;Apple;500;Green;1;
101;Fruits;50010;Grape;500;Red;1;
201;Vegetables;60010;Carrot;500;White;1;
201;Vegetables;60050;Broccoli;2000;Green;1;
201;Vegetables;60090;Tomato;1000;Red;1;
101;Fruits;50020;Strawberry;100;Red;1;
101;Fruits;50060;Apple;1000;Red;1;
201;Vegetables;60010;Carrot;250;Yellow;1;
101;Fruits;50010;Grape;100;White;1;
101;Fruits;50030;Banana;500;Yellow;1;
201;Vegetables;60010;Carrot;1000;Yellow;1;
101;Fruits;50030;Banana;1000;Green;1;
101;Fruits;50020;Strawberry;200;Red;1;
101;Fruits;50010;Grape;200;White;1;
201;Vegetables;60010;Carrot;50;Orange;1;
201;Vegetables;60080;Onion;2;White;1;
输出
Cat_id;Cat_name;Id;Name;Amount;Colour;Bla;
101;Fruits;50010;Grape;500,100,200;Red,White;1;
101;Fruits;50020;Strawberry;500,1000,100,200;Red;1;
101;Fruits;50030;Banana;500,1000;Yellow,Green;1;
101;Fruits;50060;Apple;500,1000;Green,Red;1;
201;Vegetables;60010;Carrot;100,500,250,1000,50;Purple,White,Yellow,Orange;1;
201;Vegetables;60050;Broccoli;500,1000,2000;Green;1;
201;Vegetables;60080;Onion;1,2;Purple,White;1;
201;Vegetables;60090;Tomato;50,100,500,1000;Green,Red,Yellow;1;
这篇关于基于重复键合并CSV行,并使用Perl Text :: CSV合并唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!