马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
from urllib.request import Request,urlopen
from fake_useragent import UserAgent
from urllib.parse import urlencode
def get_html(url):
headers = {'User-Agent': UserAgent().random}
request = Request(url,headers=headers)
response = urlopen(request)
return response.read()
def save_html(filename, html_bytes):
with open(filename ,'wb') as f:
f.write(html_bytes)
def main():
content = input("请输入要下载的内容:")
num = input("请输入要下载多少页:")
base_url = "https://tieba.baidu.com/f?ie=utf-8&{}"
for pn in range(int(num)):
args={
"pn": pn * 50,
"kw": content
}
filename = "第" + str(pn+1) + "页.html"
args = urlencode(args)
print("正在下载"+filename)
html_bytes = get_html(base_url.format(args))
save_html(filename, html_bytes)
if __name__ == '__main__':
main()
为啥成功爬下来的HTML文件打开后没内容呢。。。
文件有内容
|