|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 不尴尬 于 2018-2-1 21:52 编辑
- import requests
- import re
- import time
- import random
- headers = {'User-Agent':
- 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0'}
- def get_chapter_data(url):
- res = requests.get(url,headers=headers)
- res.encoding = 'gbk'
- html = res.text
- chapter_data = re.findall(r'<div class="yd_text2">(.*?)</div>',html,re.S)[0]
- chapter_data = chapter_data.strip()
- chapter_data = chapter_data.replace(' ','')
- chapter_data = chapter_data.replace('<br />','')
- return chapter_data
- def get_chapter_infos(novel_url):
- res = requests.get(novel_url,headers=headers)
- res.encoding = 'gbk'
- html = res.text
- chapter_infos = re.findall(r'<li><a href="(.*?)">(.*?)</a></li>',html,re.S)
- return chapter_infos
- url ='https://www.88dushu.com/xiaoshuo/71/71618/'
- chapter_info = get_chapter_infos(url)
- #print(chapter_info)
- f = open('C:/Users/Administrator/Desktop/test(py)/废土崛起(1).txt', 'w',encoding='gbk')
- for chapter in chapter_info:
- chapter_data = get_chapter_data('https://www.88dushu.com/xiaoshuo/71/71618/%s' %chapter[0])
- f.write(chapter[1])
- f.write('\n')
- f.write(chapter_data)
- print(chapter[1])
- time.sleep(random.randint(1,3))
- f.close()
复制代码 |
-
|