|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
新手做的可能代码有点繁杂
- import requests
- import time
- import re
- headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.7 Safari/537.36'}
- f=open('E:/doupo.txt','a+')
- def get_info(url):
- res=requests.get(url,headers=headers)
- if res.status_code==200:
- contents=re.findall('<p>(.*?)</p>',res.content.decode('utf-8'),re.S)
- for content in contents:
- f.write(content+'\n')
- else:
- pass
- if __name__=='__main__':
- urls=['http://www.doupoxs.com/doupocangqiong/{}.html'.format(str(i)) for i in range(2,1665)]
- for url in urls:
- get_info(url)
- time.sleep(1)
- f.closse()
- #请勿重复运行,程序运行有点慢需要等
复制代码 |
|