|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
想用Python中使用CSS选择器或者xpath获取房源列表中房源的链接,但是有些房源的标签下既有房源的链接,还有房子全景图的链接,该怎么处理才能只选取房源的链接!!我用CSS选择器做的如下,但是会连通全景图的标签一起提取出来
response.css('div.build_des.dingwei div.build_pic div.dingwei a::attr(href)').extract()
选取的结果如下:
['/newhouse/property_330181_526025065_info.htm', '/newhouse/property_33_511129767_info.htm', '/newhouse/property_33_82149869_info.htm', '/newhouse/property_33_
82149869_view720list.htm', '/newhouse/property_33_1161890609_info.htm', '/newhouse/property_33_1161890609_view720list.htm', '/newhouse/property_33_1166145658_i
nfo.htm', '/newhouse/property_330184_879503341_info.htm']
一段包含两种链接的代码:
<div class="build_des dingwei">
<div class="build_pic">
<div class="dingwei">
<a href="/newhouse/property_33_1161890609_info.htm" target="_blank"><img src="/upload/newhouse/propertyinfo/mainlogo/20180611/15287064252420_150x113.jpg" title="海土字17241号" width="140" height="105"></a>
<span><a href="/newhouse/property_33_1161890609_view720list.htm" target="_blank"><div class="quanjing colorwht aligc tidt16">全景</div></a></span>
</div>
- from lxml import etree
- text = '''
- [color=RoyalBlue]<div class="build_des dingwei">
- <div class="build_pic">
- <div class="dingwei">
- <a href="/newhouse/property_33_1161890609_info.htm" target="_blank">
- <img src="/upload/newhouse/propertyinfo/mainlogo/20180611/15287064252420_150x113.jpg"
- title="海土字17241号" width="140" height="105">
- </a>
- <span>
- <a href="/newhouse/property_33_1161890609_view720list.htm" target="_blank">
- <div class="quanjing colorwht aligc tidt16">全景</div>
- </a>
- </span>
- </div>
- </div>
- </div>[/color]
- '''
- html = etree.HTML(text)
- result = html.xpath('//div[@class="build_des dingwei"]//div//div//a/@href')
- print(result[0])
- result = html.xpath('//div[@class="build_des dingwei"]/div/div/a/@href')
- print(result[0])
复制代码
|
|