|
|
发表于 2018-2-4 16:18:23
|
显示全部楼层
回帖奖励 +4 鱼币
BeautifulSoup这么好用干嘛不用,我感觉正则表达式可读太差
- def parse_one_page(html):
- soup = BeautifulSoup(html,'html.parser')
- body=soup.find('dl',{'class':'board-wrapper'})
- p = r'http://.*\.jpg.*'
- for k in body.find_all('dd'):
- movies =k.find('img', {'data-src': re.compile(p)})
- star = k.find('p',{'class':'star'})
- time = k.find('p',{'class':'releasetime'})
- core = k.find('p',{'class':'score'})
- print(movies['alt'])
- print(movies['data-src'],end='')
- for n in star.get_text().split(' '):
- if n !='':
- print(n,end='')
- print(time.get_text())
- print('评分:'+core.get_text())
- print("'''''''''''''''''''''''''''''''''''''")
复制代码 |
|