| 
 | 
 
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册  
 
x
 
这段代码第8行有问题 
- import requests
 
 - from bs4 import BeautifulSoup
 
  
- url="https://s.weibo.com/top/summary?cate=realtimehot"
 
 - headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.77"}
 
  
- response=requests.get(url,headers=headers)
 
 - content=response.content.decode('utf-8')
 
 - soup=BeautifulSoup(content,'lxml')
 
 
  复制代码 
报错: 
- Traceback (most recent call last):
 
 -   File "D:/py/访问微博热搜.py", line 8, in <module>
 
 -     content=response.content.decode('utf-8')
 
 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 339: invalid continuation byte
 
  复制代码 
 
把第8行换了后就好了 
- import requests
 
 - from bs4 import BeautifulSoup
 
  
- url="https://s.weibo.com/top/summary/"
 
 - headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.77"}
 
  
- response=requests.get(url,headers=headers)
 
 - content=response.encoding='utf-8'
 
 - soup=BeautifulSoup(content,'lxml')
 
 - print(soup)
 
 
  复制代码 
不过打印结果是这个 
- <html><body><p>utf-8</p></body></html>
 
  复制代码 
我想要的是源代码 
 
大佬们能解释一下第八行和第九行是什么意思吗 
该怎么处理 
谢谢大家   
第八行不懂。。。网页采取的编码一般都是 utf8  
第九行就是获取 lxml 
还是得加 cookie,不然微博有反爬,中途可以print一下soup的类型,就知道soup是什么了(就是bs4的类) 
然后是可以正常使用的
 - import requests
 
 - from bs4 import BeautifulSoup
 
  
- url="https://s.weibo.com/top/summary?cate=realtimehot"
 
 - headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36 Edg/103.0.1264.77",
 
 -          "cookie":cookie}
 
  
- response=requests.get(url,headers=headers)
 
 - content = response.text
 
 - soup=BeautifulSoup(content,'lxml')
 
 - print(type(soup)) # <class 'bs4.BeautifulSoup'>
 
 - for i in soup.find_all("td", class_ = "td-02"):
 
 -     print(i.a.text)
 
  复制代码 
 
 
 |   
 
 
 
 |