|
发表于 2021-8-10 12:43:44
|
显示全部楼层
搞定,我就是不知道你style是匹配什么的,原因是正则表达式
- url = 'https://movie.douban.com/top250'
- headers = {
- 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
- }
- res0 = requests.get(url, headers=headers)
- res = res0.text
- res0.close()
- print('获取完毕')
- regu = re.compile(
- r'.*?<span class="title">(?P<name>.*?)</span>.*?<span class="rating_num".*?>(?P<score>.*?)</span>.*?<span class="inq">(?P<summary>.*?)</span.*?', re.S)
- its = regu.finditer(res)
- # regu = re.compile(r'.*?<span class="title">(?P<name>.*?)</span>.*?<span class="rating_num" property="v:average">(?P<score>.*?)</span>.*?<span class="inq">(?P<summary>)</span>', re.S)
- # its = regu.finditer(res)
- # print(its)
- f = open('豆瓣.txt', 'w', encoding="utf-8")
- for it in its:
- print('进来啦...')
- newstr = it.group('name')+'/'+it.group('score') + \
- '/'+it.group('summary')+'\n'
- f.write(newstr)
- print('完成......')
- f.close()
复制代码 |
|