| 
 | 
 
 
 楼主 |
发表于 2021-9-29 11:50:29
|
显示全部楼层
 
 
 
 
import requests,os,urllib,re 
from lxml import  etree 
import time 
from random import randint 
from bs4 import BeautifulSoup 
 
proxies  =  { 
   "http" :  "http://127.0.0.1:10809" , 
   "https" :  "http://127.0.0.1:10809" , 
} 
target="http://www.fangongheike.com/" 
req = requests.get(url=target, proxies=proxies) 
req.encoding='utf-8' 
#print(req.text) 
html1 = etree.HTML(req.text) 
html2 = req.text 
title = BeautifulSoup(html2,"html.parser") 
bt = title.find_all('h3',class_='post-title') 
nr = html1.xpath("//div[@class='post hentry']//text()") 
#print(bt) 
#print(nr) 
 
要提取的内容就是爬出来的,那首页博客标题还有对应的,战果展示,还有xx年月日攻克xxx那句话,下一级的xpath直接就是p标签了,不知道怎么弄 |   
 
 
 
 |