|

楼主 |
发表于 2021-1-16 11:02:34
|
显示全部楼层
- import requests
- from bs4 import BeautifulSoup
- url = 'https://wenku.baidu.com/view/6e47f32a846a561252d380eb6294dd88d1d23d72.html'
- header = {'User-agent': 'Googlebot'}
- res = requests.get(url , headers = header)
- res.text
- print(res.text)
- plist = []
- soup = BeautifulSoup(res.content, "html.parser")
- plist.append(str(soup.title))
- for div in soup.find_all('div', attrs={"class": "bd doc-reader"}):
- plist.extend(div.get_text().split('\n'))
- plist = [c.replace(' ', '') for c in plist]
- plist = [c.replace('\x0c', '') for c in plist]
- plist
- file = open('test.txt', 'w',encoding='utf-8')
- for str in plist:
- file.write(str)
- file.write('\n')
- file.close()
复制代码 |
|