Nokogiri 取不出链接
require 'nokogiri'
require 'open-uri'
require 'iconv'
url='http://zu.cq.soufun.com/house/c21000-d22000-g22-s31-kw%bd%f0%c9%bd%c3%fb%b6%bc/'
xpath="//p[@class='housetitle']/a"
# get the nokogiri document
doc = Nokogiri::HTML(open(url))
doc.xpath(xpath).each do |link| # doc.css("p.housetitle").each do |link|
puts link.content
puts link['href']
end据文件下载到本地
用 url="http://localhost/test.html"
也还是取不出来 您解析的"//p[@class='housetitle']/a"是不是有问题?我看过这个页面,我没看到class='housetitle'的class.
我用下面的代码可以解析出网址:1 #!/usr/bin/env ruby
2 require 'nokogiri'
3 require 'open-uri'
4 #require 'iconv'
5 url='http://zu.cq.soufun.com/house/c21000-d22000-g22-s31-kw%bd%f0%c9%bd%c3%fb%b6%bc/'
6 #xpath="/a/@href"
7
8 # get the nokogiri document
9 doc = Nokogiri::HTML(open(url))
10
11 doc.xpath("//div/a/@href").each do |link| # doc.css("p.housetitle").each do |link|
12 puts link.content
13 puts link['href']
14 end
本帖最后由 yakczh 于 2012-05-09 12:22 编辑
<p class="housetitle">
<a href='/chuzu/3_3956185_1.htm' target="_blank"><strong>
龙脊金山名都 2房2厅1卫 2000/月 精装修
</strong></a>
查看源码 直接搜索 class="housetitle"
页:
[1]