|
|
发表于 2018-5-18 23:09:53
|
显示全部楼层
- import requests,re
- content=requests.get('https://book.douban.com/').text
- # print(content)
- #pattern=re.compile('<li.*?cover.*?href="(.*?)".*?title="(.*?)".*?more-meta.*?author">(.*?)</span>.*?year">(.*?)</span>.*?</li>',re.S)
- pattern = re.compile(r'''<div class=".*?">
- <div class="title">
- <a class="" href="(.*?)"
- title=".*?">(.*?)</a>.*?<div class="author">
- (.*?)
- </div>.*?<span class="year">
- (.*?)
- </span>''',re.S)
- results =re.findall(pattern,content)
- # print(results)
- # print("ok")
- for result in results:
- url,name,author,date=result
- author=re.sub('\s','',author)
- #date=re.sub('\s','',date)
- #print(url,name,author.date)
- print(result)
复制代码
应该是可以跑出来吧,就是很慢很慢,正则还是不要写的这么省略吧,
新手代码,勿喷 |
|