|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
- import requests
- import os
- import time
- from lxml import etree
- url = 'https://wuhan.esf.fang.com/'
- headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
- response = requests.get(url,headers=headers)
- result = response.text
- root = etree.HTML(result)
- title = root.xpath('//dd/p/text()')
- for each in title:
- content = each.strip()
- print(content)
复制代码
尝试写一个爬房天下武汉二手房信息,用xapth爬出房源信息中户型 面积 层高等信息,得到的是一个列表,用strip函数去掉空格后打印出来的是一条一条的信息,想把同一房源的相关信息整合到一起,如何实现?网页源代码中房源信息<p>节点中信息文本用<i>隔成一段一段的,如何处理提取整合?
- <p class="tel_shop">
-
- 3室2厅
-
- <i>|</i>
- 89.48㎡
-
-
- <i>|</i>
- 低层(共18层)
-
-
- <i>|</i>
- 南北向
-
-
- <i>|</i>
- 2009年建
-
-
- <i>|</i>
- <span class="people_name">
复制代码
- import requests
- import os
- import time
- from lxml import etree
- url = 'https://wuhan.esf.fang.com/'
- headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
- response = requests.get(url,headers=headers)
- result = response.text
- result = result.replace('<i>|</i>','')
- root = etree.HTML(result)
- title = root.xpath('//p/[@class="tel_shop"]')
- for each in title:
- content = each.strip()
- print(content)
复制代码
|
|