这个总统数据库怎么爬？,Python交流,编程语言专区,鱼C论坛

故里发表于 2021-10-13 09:37:49

这个总统数据库怎么爬？

https://www.worldpresidentsdb.com/Yang-Shangkun/
https://www.worldpresidentsdb.com/list/gender/female/

suchocolate 发表于 2021-10-13 18:22:16

import requests
from lxml import etree

def main():
base_url = 'https://www.worldpresidentsdb.com'
url = 'https://www.worldpresidentsdb.com/list/gender/female/'
headers = {'user-agent': 'firefox'}
r = requests.get(url, headers=headers)
html = etree.HTML(r.text)
psts = html.xpath('//div[@class="list-group"]//@href')
for pst in psts:
   url = f'{base_url}{pst}'
   r = requests.get(url, headers=headers)
   html = etree.HTML(r.text)
   info = html.xpath('//div/p//text()')
   print(info)

if __name__ == '__main__':
main()

页: [1]

鱼C论坛's Archiver

这个总统数据库怎么爬？