关于python爬虫卡住的问题

淡痕抹夕 · 发表于 2020-5-26 16:00:24

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

import time
import requests
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
T=5000
template_url="https://tieba.baidu.com/f?kw=%E6%9D%91%E4%B8%8A%E6%98%A5%E6%A0%91&ie=utf-8&pn={}"
def extra_from_one_page(page_lst):
'''从一页中提取帖子'''
tmp=[]
for i in page_lst:
      if int(i.find(class_='threadlist_rep_num').text)>T:
         dic={}
         dic['num']=int(i.find(class_='threadlist_rep_num').text)
         dic['name']=i.find(class_='threadlist_title').text
         dic['address']='https://tieba.baidu.com'+i.find(class_='threadlist_title').a['href']
         tmp.append(dic)
return tmp
def search_n_pages(n):
print('爬取n页数据')
target=[]
for i in range(n):
      print('pages:',i)
      target_url=template_url.format(50*i)
      res=requests.get(target_url)
      soup=BeautifulSoup(res.text,'html.parser')
      page_lst=soup.find_all(class_='j_thread_list')
      page_lst.pop(0)
      target.extend(extra_from_one_page(page_lst))
      time.sleep(0.2)
return target
d=search_n_pages(649)
data=pd.DataFrame(d)
data.to_excel('村上春树吧.xlsx')

———————————————————————————————————————————————————————————————————————————————————————————————
我写了一段代码，用来爬取百度贴吧点击率超过某一阈值的帖子，我想爬取前649页，但spyder运行到第121页就卡住了，网络信号很好，所以不是网络的问题。请问这样的情况应该怎么办？

zwhe · 发表于 2020-5-27 10:19:07

账号		自动登录	找回密码
密码			立即注册

关于python爬虫卡住的问题

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块