一枚丶学渣 发表于 2020-8-9 16:07:51

python爬虫

以下是我按照教程所写的代码,运行之后没有反馈,求大佬指点,


import urllib.request
import re

def open_url(url):
    req = urllib.request.Request(url)
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362')
    page = urllib.request.urlopen(req)
    html = page.read().decode('utf-8')

    return html

def get_img(html):
    p=r'<img width="560" height="860" class="BDE_Image" style="cursor: url(//tb2.bdstatic.com/tb/static-pb/img/cur_zin.cur), pointer;" src=[^"]+\.jpg"'
    imglist = re.findall(p,html)
    n=1
    for each in imglist:
      n += 1
      if n==5:
            break
      print("++++++++")
      print(each)

if __name__ == '__main__':
    url="https://tieba.baidu.com/p/6769674730"
    get_img(open_url(url))

Twilight6 发表于 2020-8-9 16:23:29


正则没提取到数据,改成这样吧:

import urllib.request
import re

def open_url(url):
    req = urllib.request.Request(url)
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.18362')
    page = urllib.request.urlopen(req)
    html = page.read().decode('utf-8')

    return html

def get_img(html):
    p=r'<img class="BDE_Image" src="(.+?)"'
    imglist = re.findall(p,html)
    n=1
    for each in imglist:
      n += 1
      if n==5:
            break
      print("++++++++")
      print(each)

if __name__ == '__main__':
    url="https://tieba.baidu.com/p/6769674730"
    get_img(open_url(url))

极品召唤兽 发表于 2020-8-9 16:31:07

Twilight6 发表于 2020-8-9 16:23
正则没提取到数据,改成这样吧:

我去...这是真的厉害...
页: [1]
查看完整版本: python爬虫