|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
爬虫代码:
import requests
import re
content = requests.get('https://book.douban.com/').text
pattern = re.compile('<li.?cover.?href="(.?)".?title="(.?)".?more-meta.?author">(.?)</span>.?year">(.?)</span>.*?<li>',re.S)
results = re.findall(pattern,content)
print(results)
for result in results:
url,name,author,date = result
author = re.sub('\s',author)
date = re.sub('\s','',date)
print(url,name,author,date)
执行一直显示在执行,但毫无结果,也没报错,等一个小时还在执行,求大神帮忙看看问题出在哪
一看就知道正则没写对, 看下面代码的正则:
- import requests
- import re
- content = requests.get('https://book.douban.com/').text
- pattern = re.compile(r'<li class="">.*?<div class="cover">.*?<a href="(.+?)" title="(.+?)".*?<span class="author">(.+?)</span>.*?<span class="year">(.+?)</span>.*?</li>',re.S)
- results = re.findall(pattern,content)
- for result in results:
- url,name,author,date = result
-
-
- print(url)
- print(name)
- print(author)
- print(date)
复制代码
|
|