hpricto不支持gb2312,iconv后还是个报错
if RUBY_VERSION =~ /1.9/Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
end
require 'iconv'
require 'hpricot'
require 'open-uri'
site=Hash.new
site['url']='http://zu.cq.soufun.com/house/c21000-d22000-g22-s31-kw%bd%f0%c9%bd%c3%fb%b6%bc/'
site['xpath']='//p[@class="housetitle"]/'
file=open(site['url'])
puts file.charset
content=Iconv.conv('UTF-8//IGNORE', file.charset, file.read)
doc = Hpricot(content)
doc.search(site['xpath']).each do |link|
text= link.inner_text
puts text
end 报错信息in `<main>': "\x90" on GB2312 (Encoding::InvalidByteSequenceError)
本帖最后由 yakczh 于 2012-02-13 16:06 编辑
不加if RUBY_VERSION =~ /1.9/
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
end这个头报的错不一样
Ruby193/lib/ruby/gems/1.9.1/gems/hpricot-0.8.6-x86-mswin32/lib/hpricot/builder.rb:9:in `gsub': invalid byte sequence in GBK (ArgumentError)
这个头搞得代码好难看 本帖最后由 yakczh 于 2012-02-14 09:56 编辑
加了if RUBY_VERSION =~ /1.9/
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
end以后还需要在文件头指定#encoding: utf-8
文件编码吗?
页:
[1]