爬微博数据,萌新交流区,萌新训练营,鱼C论坛

温木zou 发表于 2020-7-6 17:36:19

爬微博数据

以下是爬虫源码求大佬帮调以下为什么我啥也爬不下来
import requests
import bs4
import re

def open_url(url):
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36'}

res = requests.get(url, headers=headers)

return res

def find_top(res):
soup = bs4.BeautifulSoup(res.text,'html.parser')

title = []
tag = soup.find_all('a',class_='S_txt1')
for each in tag:
   title.append(each.text)

number = []
tag = soup.find_all('span',class_='number')
for each in tag:
   number.append(each.text)
result = []
length = len(number)
for i in range(length):
   result.append(title + number)
return result

def main():
host = 'https://d.weibo.com/231650'
res = open_url(host)

result = []

result.extend(find_top(res))
with open('wb.txt','w',encoding='utf-8') as f:
   for each in result:
         f.write(each)

if __name__ == '__main__':
main()

Twilight6 发表于 2020-7-6 17:41:51

呃 ... 你headers 都小写了....

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36'}

温木zou 发表于 2020-7-6 17:46:14

Twilight6 发表于 2020-7-6 17:41
呃 ... 你headers 都小写了....

我看老污龟爬虫的代码里也是这样的呀
应该没问题的吧

cj0313 发表于 2020-7-6 17:55:03

不会

Twilight6 发表于 2020-7-6 18:35:09

温木zou 发表于 2020-7-6 17:46
我看老污龟爬虫的代码里也是这样的呀
应该没问题的吧

在哪？我记得必须要严格规范呀

温木zou 发表于 2020-7-6 19:13:41

Twilight6 发表于 2020-7-6 18:35
在哪？我记得必须要严格规范呀

python交流极客爬虫那里

Twilight6 发表于 2020-7-6 21:00:08

温木zou 发表于 2020-7-6 19:13
python交流极客爬虫那里

好吧，那是我的疏忽了

微博反爬比较难搞，你的代码比较简易，爬到的网站都是被反爬后的，你可以去参考这几个文章：

https://blog.csdn.net/lwgkzl/article/details/89237060

https://blog.csdn.net/qq_38316655/article/details/80671358

Ps：这些都是去年之前的了，微博应该还是会有改动的，这里面只能仅供参考了

温木zou 发表于 2020-7-7 08:14:12

Twilight6 发表于 2020-7-6 21:00
好吧，那是我的疏忽了

微博反爬比较难搞，你的代码比较简易，爬到的网站都是被反爬后的，你可以 ...

有没有交反爬知识的教程能顺便推荐下不？

Twilight6 发表于 2020-7-7 08:16:42

温木zou 发表于 2020-7-7 08:14
有没有交反爬知识的教程能顺便推荐下不？

我也不太清楚，不过可以建议你去看看《Python 3网络爬虫开发实战》这本书不错

页: [1]

鱼C论坛's Archiver

爬微博数据