|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 wcq15759797758 于 2022-5-3 15:18 编辑
复盘爬虫(一)
- import re
- import requests
- def main(url):
- headers = {
- 'User-Agent': ('Mozilla/5.0 (compatible; MSIE 9.0; '
- 'Windows NT 6.1; Win64; x64; Trident/5.0)'),
- }
- respomse = requests.get(url=url,headers=headers)
- respomse.encoding='utf-8'
- html = respomse.text
- obj = re.compile(r'<li>.*?<div class="item">.*?<span class="title">(?P<name>.*?)'
- r'</span>.*?<p class="">.*?<br>(?P<year>.*?) .*?<span class="rating_num" property="v:average">(?P<PF>.*?)</span>',re.S)
- resulf = obj.finditer(html)
- for i in resulf:
- '''print(i.group('name'))
- print(i.group('year').strip())
- print(i.group('PF'))
- '''
- item = {}
- item['name'] = i.group('name')
- item['year'] = i.group('year').strip()
- item['评分'] = i.group('PF')
- print(item)
- if __name__ == '__main__':
- for page in range(0,275,25):
- url = f'https://movie.douban.com/top250?start={page}'
- main(url=url,headers=headers)
复制代码 |
|