wangwang123 发表于 2021-10-31 16:25:10

urllib 获取html问题


import urllib.request

url = "https://www.baidu.com/s?ie=utf-8&medium=0&rtt=4&bsst=1&rsv_dl=news_t_sk&cl=2&wd=阿里巴巴"

header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                        'Chrome/95.0.4638.54 Safari/537.36'}

request = urllib.request.Request(url, headers=header)

response = urllib.request.urlopen(request)

bs = BeautifulSoup(response, "html.parser")

问题: urllib函数不能识别中文? 只能通过requests模块获取吗,是不是可以在Request那里增加一个参数就可以识别了呢?有点疑惑。

谢谢大佬帮忙解答!

hrpzcf 发表于 2021-10-31 16:34:55

https://blog.csdn.net/mouday/article/details/80278938

wangwang123 发表于 2021-10-31 17:22:09

hrpzcf 发表于 2021-10-31 16:34


from urllib request import quote,unquote 没有这个库啊,在setting添加也没有

hrpzcf 发表于 2021-10-31 17:32:12

from urllib.request import quote,unquote

wangwang123 发表于 2021-10-31 18:47:41

hrpzcf 发表于 2021-10-31 17:32
from urllib.request import quote,unquote

Cannot find reference 'quote' in 'request.pyi' 一样的 有这个问题

suchocolate 发表于 2021-10-31 19:30:08

from urllib.parse import quote
keyword = '壁纸'
url = 'https://www.baidu.com/s?wd=' + quote(keyword)
print(url)

hrpzcf 发表于 2021-10-31 19:33:04

import urllib.request
# from urllib.request import quote
from urllib.parse import quote # 上面一句不行就试试这个,我用两个都正常工作

from bs4 import BeautifulSoup

cn = quote("阿里巴巴")
url = (
    "https://www.baidu.com/s?ie=utf-8&medium=0&rtt=4&bsst=1&rsv_dl=news_t_sk&cl=2&wd=%s"
    % cn
)

header = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/95.0.4638.54 Safari/537.36"
}

request = urllib.request.Request(url, headers=header)

response = urllib.request.urlopen(request)

# bs = BeautifulSoup(response, "html.parser")

print(response.read().decode("utf-8"))

redforce 发表于 2021-10-31 19:33:41

from urllib.parse import quote

redforce 发表于 2021-10-31 19:35:22

import urllib.request
from urllib.parse import quote
from bs4 import BeautifulSoup

wd = quote('阿里巴巴')

url = f"https://www.baidu.com/s?ie=utf-8&medium=0&rtt=4&bsst=1&rsv_dl=news_t_sk&cl=2&wd={wd}"

header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                        'Chrome/95.0.4638.54 Safari/537.36'}

req = urllib.request.Request(url, headers=header)

response = urllib.request.urlopen(req)

bs = BeautifulSoup(response, "html.parser")
print(bs)

wangwang123 发表于 2021-11-1 00:28:12

hrpzcf 发表于 2021-10-31 19:33


谢谢!
页: [1]
查看完整版本: urllib 获取html问题