|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
- from urllib.request import Request,urlopen
- from fake_useragent import UserAgent
- from urllib.parse import urlencode
- def get_html(url):
- headers = {'User-Agent': UserAgent().random}
- request = Request(url,headers=headers)
- response = urlopen(request)
- return response.read()
- def save_html(filename, html_bytes):
- with open(filename ,'wb') as f:
- f.write(html_bytes)
- def main():
- content = input("请输入要下载的内容:")
- num = input("请输入要下载多少页:")
- base_url = "https://tieba.baidu.com/f?ie=utf-8&{}"
- for pn in range(int(num)):
- args={
- "pn": pn * 50,
- "kw": content
- }
- filename = "第" + str(pn+1) + "页.html"
- args = urlencode(args)
- print("正在下载"+filename)
- html_bytes = get_html(base_url.format(args))
- save_html(filename, html_bytes)
- if __name__ == '__main__':
- main()
复制代码
为啥成功爬下来的HTML文件打开后没内容呢。。。
文件有内容
|
|