8中使用非英语字符的无效字节序列

8中使用非英语字符的无效字节序列

本文介绍了CSV在Rails中导入 - UTF-8中使用非英语字符的无效字节序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用将CSV文件中的某些记录导入Rails 3模型。 (我使用这个gem,因为它是我找到最简单的方法来做这个)



无论如何,我用来导入记录的代码是以下:

  r = import('doc / socios_full.csv')do 
map_to关联
{| row,associate | associate.save}
start_at_row 1
[group,member,family_relationship_code,family_relationship_description,last_name,names,...]
#上一行实际上更长,有更多的atts,切换以解释示例
end






它工作得很好,除非当解析器遇到一些非英文字符,如ó,é,ñ,í,°... 。这是当我得到以下错误:

  ArgumentError:UTF-8中的无效字节序列
从/ home / bcb /.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:in`sub!'
从/home/bcb/.rvm/rubies/ruby- 1.9.2-p136 / lib / ruby​​ / 1.9.1 / csv.rb:1831:inblock in shift
从/home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib /ruby/1.9.1/csv.rb:1825:in`loop'
从/home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv。 rb:1825:in'shift'
从/home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1767:in`each'
从/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:in`each_with_index'
从/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:in`import'
从(irb ):63
从/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:44:in'start'
来自/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:8:in`start'
来自/home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands.rb:23:in`< top(必填)>'
从脚本/ rails:6:在`require'
从脚本/ rails:6:在`< main>'

我真的确定这是因为如果我替换所有这些字符,问题消失,直到解析器找到另一个非英文字符。事情是,我有一个50k的记录文件,所以搜索每个字符我可以想到,并试图每次导入所有这些记录是非常耗时。



有没有办法忽略这些错误,并允许解析器继续?

解决方案

使用不同的方法解决问题,这是一个更容易的解决方案比使用外部gem将CSV文件导入Rails 3模型:

  require'csv'
CSV.foreach ('doc / socios_full.csv')do | row |
record = Associate.new(
:media_format => row [0],
:group => row [0],
:member => row [1 ],
:family_relationship_code => row [2],
:family_relationship_description => row [3],
:last_name => row [4] => row [5],
...

record.save!
end

它运行正常,即使使用非英文字符文件!)。希望对某人有帮助。


I'm using the CSVMapper Gem to import some records in a CSV file to a Rails 3 model. (I used this gem because it is what I've found the easiest way to do this)

Anyway, the code I'm using to import the records is the following:

r = import('doc/socios_full.csv') do
    map_to Associate
    after_row lambda{|row, associate| associate.save }
    start_at_row 1
    [group,member,family_relationship_code,family_relationship_description,last_name,names,...]
#The previous line is actually longer, with more atts, but it's been cut to explain the example
end


And it works very well, except when the parser encounters some non-english characters, like ó, é, ñ, í, °.... That's when I get the following error:

ArgumentError: invalid byte sequence in UTF-8
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:in `sub!'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1831:in `block in shift'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1825:in `loop'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1825:in `shift'
    from /home/bcb/.rvm/rubies/ruby-1.9.2-p136/lib/ruby/1.9.1/csv.rb:1767:in `each'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:in `each_with_index'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/csv-mapper-0.5.1/lib/csv-mapper.rb:106:in `import'
    from (irb):63
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:44:in `start'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands/console.rb:8:in `start'
    from /home/bcb/.rvm/gems/ruby-1.9.2-p136/gems/railties-3.0.9/lib/rails/commands.rb:23:in `<top (required)>'
    from script/rails:6:in `require'
    from script/rails:6:in `<main>'

I'm really certain of this because if I replace all of these characters, the problem goes away until the parser finds another non-english character. The thing is that I have a 50k records file, so searching for each character I can think of and trying to import all of these records every time is very time consuming.

Is there a way to ignore these errors and allow the parser to go on? Or is there an easier way to import this CSV file?

解决方案

Solved it with a different approach, this is a much easier solution for importing CSV files into a Rails 3 model than using an external gem:

    require 'csv'
    CSV.foreach('doc/socios_full.csv') do |row|
        record = Associate.new(
            :media_format   => row[0],
            :group => row[0],
            :member => row[1],
            :family_relationship_code => row[2],
            :family_relationship_description => row[3],
            :last_name => row[4],
            :names => row[5],
            ...
        )
        record.save!
    end

It works flawlessly, even with non-english characters (just tried a 75k import file!). Hope it's helpful for someone.

这篇关于CSV在Rails中导入 - UTF-8中使用非英语字符的无效字节序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 11:56