swanseabrian 发表于 2021-12-5 23:20:49

python request采集 正则 只能匹配一行,是啥情况

python request采集 正则 只能匹配一行,是啥情况
哪位大神帮我看一下,这正则 是哪里的问题,
代码如下,
我想匹配
SON_DATA.push(["48","603806","福斯特","196","9,838","-2,852","1247536.86","10.34","-3.31"]); 中的

603806   福斯特
里面有多行数据,代码我贴出来下面
这正则 应该怎么写才可以匹配到呢,谢谢
import requests
import re

def get_html(url):
    try:
      resp = requests.get(url)
      return resp.text
    except Exception as e:
      print(e)

if __name__ == "__main__":

   url = 'http://fund.jrj.com.cn/action/fhs/list.jspa?thisReportDate=0'
   html = get_html(url)
   print(html)




   pattern = re.compile(r'(?<=("))[\u4e00-\u9fa5]+(?=")',re.S)



   searchObj = pattern.search(html)


   print(searchObj.group())

swanseabrian 发表于 2021-12-6 10:53:14

帮我看一下,谢谢

specail 发表于 2021-12-6 11:20:30

import requests
import re

def get_html(url):
    try:
      resp = requests.get(url)
      return resp.text
    except Exception as e:
      print(e)

if __name__ == "__main__":

   url = 'http://fund.jrj.com.cn/action/fhs/list.jspa?thisReportDate=0'
   html = get_html(url)


   pattern = re.compile(r'JSON_DATA.push\(\["\d+","(\d{6})","(.*?)".*?\]\);',re.S)



   searchObj = pattern.finditer(html)
   for each in searchObj:
         print(each.group(1),each.group(2))

swanseabrian 发表于 2021-12-6 12:16:09

specail 发表于 2021-12-6 11:20
import requests
import re



从来没听说过,正则还需要循环的,这是啥原理 ,老铁
页: [1]
查看完整版本: python request采集 正则 只能匹配一行,是啥情况