yakczh 发表于 2012-02-13 16:02

hpricto不支持gb2312,iconv后还是个报错

if RUBY_VERSION =~ /1.9/
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
end
require 'iconv'
require 'hpricot'
require 'open-uri'



site=Hash.new

site['url']='http://zu.cq.soufun.com/house/c21000-d22000-g22-s31-kw%bd%f0%c9%bd%c3%fb%b6%bc/'

site['xpath']='//p[@class="housetitle"]/'
file=open(site['url'])
puts file.charset
      content=Iconv.conv('UTF-8//IGNORE', file.charset, file.read)
    doc = Hpricot(content)
    doc.search(site['xpath']).each do |link|
      text= link.inner_text
    puts text
end 报错信息in `<main>': "\x90" on GB2312 (Encoding::InvalidByteSequenceError)

yakczh 发表于 2012-02-13 16:06

本帖最后由 yakczh 于 2012-02-13 16:06 编辑

不加if RUBY_VERSION =~ /1.9/
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
end这个头报的错不一样

Ruby193/lib/ruby/gems/1.9.1/gems/hpricot-0.8.6-x86-mswin32/lib/hpricot/builder.rb:9:in `gsub': invalid byte sequence in GBK (ArgumentError)
       

Sevk 发表于 2012-02-13 21:12

yakczh 发表于 2012-02-13 21:51

这个头搞得代码好难看

yakczh 发表于 2012-02-14 09:56

本帖最后由 yakczh 于 2012-02-14 09:56 编辑

加了if RUBY_VERSION =~ /1.9/

Encoding.default_external = Encoding::UTF_8

Encoding.default_internal = Encoding::UTF_8

end以后还需要在文件头指定#encoding: utf-8
文件编码吗?
页: [1]
查看完整版本: hpricto不支持gb2312,iconv后还是个报错