|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
以下是爬虫源码 求大佬帮调以下 为什么我啥也爬不下来
- import requests
- import bs4
- import re
- def open_url(url):
- headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36'}
- res = requests.get(url, headers=headers)
- return res
- def find_top(res):
- soup = bs4.BeautifulSoup(res.text,'html.parser')
- title = []
- tag = soup.find_all('a',class_='S_txt1')
- for each in tag:
- title.append(each.text)
- number = []
- tag = soup.find_all('span',class_='number')
- for each in tag:
- number.append(each.text)
- result = []
- length = len(number)
- for i in range(length):
- result.append(title[i] + number[i])
- return result
- def main():
- host = 'https://d.weibo.com/231650'
- res = open_url(host)
- result = []
- result.extend(find_top(res))
- with open('wb.txt','w',encoding='utf-8') as f:
- for each in result:
- f.write(each)
- if __name__ == '__main__':
- main()
复制代码
好吧,那是我的疏忽了
微博反爬比较难搞,你的代码比较简易,爬到的网站都是被反爬后的,你可以去参考这几个文章:
https://blog.csdn.net/lwgkzl/article/details/89237060
https://blog.csdn.net/qq_38316655/article/details/80671358
Ps:这些都是去年之前的了,微博应该还是会有改动的,这里面只能仅供参考了
|
|