help
做一个爬虫程序,爬取页面的url为:http://books.toscrape.com,打印该页面的标题(<title>标记)和第一个超链接的文本(<a>标记import requests
from lxml import etree
def main():
url = 'http://books.toscrape.com/'
headers = {'user-agent': 'firefox'}
r = requests.get(url, headers=headers)
html = etree.HTML(r.text)
title = html.xpath('normalize-space(//title/text())')
link = html.xpath('//a[@href]/text()')
print(title)
print(link)
if __name__ == '__main__':
main()
页:
[1]