|
20鱼币
异步的写法应该没有问题。已经能创建文件了,但是应该是写入部分出了问题。创建的文件没有写入操作,里面内容是空,写入时有报错。请指导。附源码
- import requests
- from bs4 import BeautifulSoup
- import aiohttp
- import aiofiles
- import asyncio
- import bs4
- import re
- from lxml import html
- def getHtml(url,headers):
- try:
- r = requests.get(url,headers)
- r.raise_for_status()
- r.encoding = r.apparent_encoding
- return r.text
- except Exception as e:
- print("访问页面有误!",e)
- return ""
- def getChater(html):
- domain = "https://www.bbiquge.net/book_84680/"
- resultls = {}
- gcbs = BeautifulSoup(html,"html.parser")
- for dd in gcbs.find('div',class_="zjbox").dl: #章节内容对应的标签
- if isinstance(dd,bs4.element.Tag):
- if dd.name == "dd" and dd.string != None:
- #print(dd.a.string,dd.a.get("href"))
- resultls[dd.a.string]=domain + dd.a.get("href") #返回章节名及链接
- return resultls
-
- async def getContent(url,filename): #传过来的字典是 章节名:链接
- async with aiohttp.ClientSession() as session:
- async with session.get(url) as resp:
- html =await resp.text()
- # bs = BeautifulSoup(html,"html.parser").find("div",id="content")
- # content = bs.find("div",id="content")
- async with open(filename+".txt","w",encoding="utf-8") as f:
- await f.write(BeautifulSoup(html,"html.parser").find("div",id="content").text.replace('\xa0'*4,'\n\n ')) #处理符号,整理格式
- print("done!"+filename)
-
-
-
- async def main():
- url = "https://www.bbiquge.net/book_84680/"
- headers = {
- "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36 Edg/96.0.1054.53"
- }
- html = getHtml(url,headers)
- #获得 章节名及链接
- resultls = getChater(html)
- # 异步下载章节内容
- tasks = []
- for item in resultls.items():
- tasks.append(asyncio.create_task(getContent(item[1],item[0])))
- await asyncio.wait(tasks)
-
- if __name__ == "__main__":
- loop = asyncio.new_event_loop()
- asyncio.set_event_loop(loop)
- loop.run_until_complete(main())
复制代码
python 本身就是同步的,并不擅长处理异步问题,(异步我也不懂)
所以,很想知道你的动机 是练习用吗 否则 同步爬取就可以啦
|
最佳答案
查看完整内容
python 本身就是同步的,并不擅长处理异步问题,(异步我也不懂)
所以,很想知道你的动机 是练习用吗 否则 同步爬取就可以啦
|