爬小说有一些问题，请大神指教。

不尴尬 · 发表于 2018-2-1 21:20:18

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

本帖最后由不尴尬于 2018-2-1 21:52 编辑

import requests
import re
import time
import random
headers = {'User-Agent':
'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0'}
def get_chapter_data(url):
res = requests.get(url,headers=headers)
res.encoding = 'gbk'
html = res.text
chapter_data = re.findall(r'<div class="yd_text2">(.*?)</div>',html,re.S)[0]
chapter_data = chapter_data.strip()
chapter_data = chapter_data.replace(' ','')
chapter_data = chapter_data.replace('<br />','')
return chapter_data
def get_chapter_infos(novel_url):
res = requests.get(novel_url,headers=headers)
res.encoding = 'gbk'
html = res.text
chapter_infos = re.findall(r'<li><a href="(.*?)">(.*?)</a></li>',html,re.S)
return chapter_infos
url ='https://www.88dushu.com/xiaoshuo/71/71618/'
chapter_info = get_chapter_infos(url)
#print(chapter_info)
f = open('C:/Users/Administrator/Desktop/test(py)/废土崛起(1).txt', 'w',encoding='gbk')
for chapter in chapter_info:
chapter_data = get_chapter_data('https://www.88dushu.com/xiaoshuo/71/71618/%s' %chapter[0])
f.write(chapter[1])
f.write('\n')
f.write(chapter_data)
print(chapter[1])
time.sleep(random.randint(1,3))
f.close()

复制代码

不尴尬 · 发表于 2018-2-1 21:49:29

图片

太阳花田 · 发表于 2018-2-1 22:01:55

你用debug看啊！问题告诉你索引超界了！

不尴尬 · 发表于 2018-2-1 23:28:02

太阳花田发表于 2018-2-1 22:01
你用debug看啊！问题告诉你索引超界了！

debug不会

太阳花田 · 发表于 2018-2-2 09:52:15

不尴尬发表于 2018-2-1 23:28
debug不会

你pycharm右键点击run下面有一个debug啊！
你先在错误行行左边点击左键出现小红点，程序运行到这里就会停止，你看你程序里面各个参数的值
你可以在各种地方设置多个小红点，下面左边像暂停的绿色三角形点一下就会运行一步到下一个小红点
这是最基本的操作，其他你去网上百度下吧，我这里没法贴图不好说。
另外：debug是基础你不会怎么学到爬虫的！我真有点佩服你！

账号		自动登录	找回密码
密码			立即注册