[已解决]爬虫乱码问题

isdkz · 发表于 2023-7-24 12:12:17

你遇到的问题可能是由于网站使用了不同的字符编码，而你的代码中没有正确处理这种情况。在Python中，我们通常使用 response.content.decode('编码方式') 来进行编码转换。在这个案例中，网站很可能使用了'utf-8'编码。

所以你的代码可以修改如下：

import requests

headers = {

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari'

}

url = "https://www.ibiquges.net/92/92881/"

response = requests.get(url=url, headers=headers)

print(response.content.decode('utf-8'))

复制代码

如果你不能确定网页的编码方式，你也可以尝试使用 chardet 库来猜测和获取网页的编码：

import requests

import chardet

headers = {

'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari'

}

url = "https://www.ibiquges.net/92/92881/"

response = requests.get(url=url, headers=headers)

encoding = chardet.detect(response.content)['encoding']

print(response.content.decode(encoding))

复制代码

这样你的代码会根据实际的编码方式来解码网页内容。希望这对你有所帮助！

账号		自动登录	找回密码
密码			立即注册

[已解决]爬虫乱码问题

浏览过的版块