求助一个爬虫的正则问题,Python交流,编程语言专区,鱼C论坛

nanrenne 发表于 2020-3-24 23:05:30

求助一个爬虫的正则问题

本帖最后由 nanrenne 于 2020-3-24 23:08 编辑

我这个正则为什么无法匹配到中文那一段
def fenxi(html):
zhen_url = re.compile('<div.*?data-hint-title="(.*?)"', re.S)
zhen = re.findall(zhen_url,html)
print(zhen)
#re.compile('')

以下为网页原内容
<div class="book-wrapper show-none" data-index="1121670" data-id="0" data-toggle="hintpoint" data-hint-title="语文一年级下册（部编版）" style="width: 72.9429px; height: 111px; left: 13.5286px; top: 0px;"><div class="book-face"><div class="book-qrurl" style="width: 61px; height: 61px; left: 0px; top: 19px; padding: 6px; border: 0px;"><canvas width="61" height="61"></canvas></div></div><img class="book-img"

当然,他上边还有很多div

已找到答案,源文件全是JS

Stubborn 发表于 2020-3-25 00:59:25

本帖最后由 Stubborn 于 2020-3-25 01:00 编辑

data-hint-title="语文一年级下册（部编版）"

是唯一的话，直接用 .*?

data-hint-title="(.*?)"

不是唯一，考虑增加前缀，具体看实际源码

nanrenne 发表于 2020-3-26 09:43:21

Stubborn 发表于 2020-3-25 00:59
data-hint-title="语文一年级下册（部编版）"

是唯一的话，直接用 .*?

好像我还多了一个单引号,嗯.谢谢,不过这个我爬不了,涉及到JS

wp231957 发表于 2020-3-26 09:58:10

nanrenne 发表于 2020-3-26 09:43
好像我还多了一个单引号,嗯.谢谢,不过这个我爬不了,涉及到JS

把网址放出来呢

kkk999de 发表于 2020-3-26 11:58:58

源文件全是JS，请问怎么爬？

wp231957 发表于 2020-3-26 12:06:40

kkk999de 发表于 2020-3-26 11:58
源文件全是JS，请问怎么爬？

凉拌，哈哈

页: [1]

鱼C论坛's Archiver

求助一个爬虫的正则问题