爬虫出来的结果列表不符合想象,求助如何调整?
from lxml import etreeimport requests
url = "http://nj.sell.house365.com/district/"
headers={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
page_text = requests.get(url=url,headers=headers).text
tree = etree.HTML(page_text)
div_list = tree.xpath('//div[@class="mainContent__table clearfix"]/div')
for div in div_list:
title = div.xpath("./div/div/div/a/text()")
print(title)
代码如上,想爬取出干净的标题列表,但结果不符合预期,是爬取有问题还是其他问题?该如何调整呢?? 得看网站是否允许静态爬取,xpath仅限于静态解析 13, 14行改成:
title = ''.join(div.xpath("./div/div/div/a/text()"))
print(title.strip()) 鱼cpython学习者 发表于 2022-8-24 21:01
13, 14行改成:
感谢
页:
[1]