urllib 获取html问题
import urllib.request
url = "https://www.baidu.com/s?ie=utf-8&medium=0&rtt=4&bsst=1&rsv_dl=news_t_sk&cl=2&wd=阿里巴巴"
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/95.0.4638.54 Safari/537.36'}
request = urllib.request.Request(url, headers=header)
response = urllib.request.urlopen(request)
bs = BeautifulSoup(response, "html.parser")
问题: urllib函数不能识别中文? 只能通过requests模块获取吗,是不是可以在Request那里增加一个参数就可以识别了呢?有点疑惑。
谢谢大佬帮忙解答! https://blog.csdn.net/mouday/article/details/80278938 hrpzcf 发表于 2021-10-31 16:34
from urllib request import quote,unquote 没有这个库啊,在setting添加也没有 from urllib.request import quote,unquote hrpzcf 发表于 2021-10-31 17:32
from urllib.request import quote,unquote
Cannot find reference 'quote' in 'request.pyi' 一样的 有这个问题 from urllib.parse import quote
keyword = '壁纸'
url = 'https://www.baidu.com/s?wd=' + quote(keyword)
print(url) import urllib.request
# from urllib.request import quote
from urllib.parse import quote # 上面一句不行就试试这个,我用两个都正常工作
from bs4 import BeautifulSoup
cn = quote("阿里巴巴")
url = (
"https://www.baidu.com/s?ie=utf-8&medium=0&rtt=4&bsst=1&rsv_dl=news_t_sk&cl=2&wd=%s"
% cn
)
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/95.0.4638.54 Safari/537.36"
}
request = urllib.request.Request(url, headers=header)
response = urllib.request.urlopen(request)
# bs = BeautifulSoup(response, "html.parser")
print(response.read().decode("utf-8"))
from urllib.parse import quote import urllib.request
from urllib.parse import quote
from bs4 import BeautifulSoup
wd = quote('阿里巴巴')
url = f"https://www.baidu.com/s?ie=utf-8&medium=0&rtt=4&bsst=1&rsv_dl=news_t_sk&cl=2&wd={wd}"
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/95.0.4638.54 Safari/537.36'}
req = urllib.request.Request(url, headers=header)
response = urllib.request.urlopen(req)
bs = BeautifulSoup(response, "html.parser")
print(bs) hrpzcf 发表于 2021-10-31 19:33
谢谢!
页:
[1]