python 网页小说爬虫。,萌新交流区,萌新训练营,鱼C论坛

s1986q 发表于 2015-7-1 19:05:33

python 网页小说爬虫。

本帖最后由 s1986q 于 2015-7-1 19:23 编辑

import httplib,re,os
if os.path.isdir(r"e:\\html\\")<>1:
os.mkdir("e:\\html\\")
f=open("e:\\html\\17182420.html","w")
f.close()
def getapost(url):
host="m.7gxs.com"
port=80
global dat
dat=""
data={"Cookie:":dat}
fd=httplib.HTTPConnection(host,port)
fd.request("GET",url,"",data)
gh=fd.getresponse()
dat=gh.getheader("set-cookie","")
jj=gh.read()
return jj
zmu=os.listdir("e:\\html\\")
dd=zmu
ur="http://www.shuhaha.com/Html/Book/66/66595/"
url=ur+dd
while True :
print dd
htm=getapost(url)
f = open("e:\\html\\"+dd,"w")
f.write(htm)
f.close()
try:
   dd=re.findall(r"var nextpage=\"(\d+\.html)\"",htm)
except :
   print"任务完成！".decode("u8")
   break
url=ur+dd

有很多人写了，我写一个。
后续在发处理程序。

wzdnzd 发表于 2016-2-18 21:56:15

。。。。。。。。。

whuer_py 发表于 2016-5-6 08:54:47

可以的

superFeng777 发表于 2016-9-12 10:17:38

楼主给力！{:10_256:}默默的收下，然后再改造！

页: [1]

鱼C论坛's Archiver

python 网页小说爬虫。