鱼C论坛

 找回密码
 立即注册
查看: 1771|回复: 6

[已解决]为什么这样正则匹配不到数据?是"字符串"变数+"字符串"的表達式

[复制链接]
发表于 2022-9-18 16:35:07 | 显示全部楼层 |阅读模式
50鱼币
本帖最后由 fdfanmo 于 2022-9-20 15:27 编辑
  1. import urllib.request
  2. import re
  3. import sys
  4. url="https://pornchil.com/after-hours-exposed-siterip/#more-98114"
  5. headers = {
  6. "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
  7. }
  8. #創造一個response對象
  9. request = urllib.request.Request(url=url,headers=headers)
  10. #訪問url
  11. response = urllib.request.urlopen(request)
  12. #接收並轉碼讀取的原始碼
  13. source_code = response.read().decode("utf-8")
  14. print(source_code)


  15. f=open("G:\\after-hours-exposed-siterip.txt","r")
  16. movie_name = f.readlines()

  17. #after-hours-exposed-siterip.txt内读取到的内容
  18. after-hours-exposed-siterip=[
  19. 20171011_public_rooftop_blowjob_in_old_town_riga_latvia,
  20. 20200923_double_teen_blowjob_doing_makeup_then_cumblast_croatia_vacation_1,
  21. 20200429_pov_dildoing_and_pussy_eating_vanessa_klein_pov_misspussycat_1,
  22. 20200318_19yo_pretty_blonde_mia_back_for_a_nice_afternoon_blowjob,
  23. 20200115_19yo_mia_sucking_me_off_and_2_private_sex_tapes_from_her_phone,
  24. 20190619_teen_jete_first_forest_blowjob_and_mouth_cum_drool,
  25. 20190227_barely_18_alina_huge_tits_and_sucking_me_off_titty_fuck_until_mouth_cum,
  26. 20190206_nervous_18yo_alina_blowjob_handjob_combo_huge_tits_cumed_and_glaz,
  27. 20190213_dream_night_with_my_18yo_blonde_latvian_dream_girl_one_night_on_a_cruis,
  28. 20171011_public_rooftop_blowjob_in_old_town_riga_latvia,
  29. 20210825_real_highschool_cheerleader_nervously_gives_perfect_nice.blowjob_pov_di,
  30. 20210818_super_double_blowjob_miss_pussycat_and_spinner_blake,
  31. 20210428_pov_lesbian_miss_pussycat_ice_and_poprocks_pussy_licking,
  32. 20210210_big_boobed_paula_giving_sexy_pov_blowjob,
  33. 20210127_new_girl_18yo_kelly_anne_pov_pussy_licking_striptease_with_miss_pussyca,
  34. ]


  35. for value in movie_name:
  36.     re_str = value.replace("_",".")
  37.     print(re_str)
  38.     url_link1 = re.search(rf"https.+{20171011.public.rooftop.blowjob.in.old.town.riga.latvia}+.mp4.html", source_code)
  39.     print(url_link1)
复制代码


我想要取到这样的数据
利用after-hours-exposed-siterip里面的元素匹配到完整的网址
"https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20171011_public_rooftop_blowjob_in_old_town_riga_latvia.mp4.html"
表達式是"字符串"变数+"字符串"这样的的表達式
最佳答案
2022-9-18 16:35:08
如果直接按你.rar文件里面的after-hours-exposed-siterip.txt读取的话, 最后的for循环改一下:
for value in movie_name[1:]:
    url_link1 = rf'https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/{value.strip()}.mp4.html'
    print(url_link1)

取完整網址.rar

1.7 KB, 下载次数: 1

source_code.rar

31.17 KB, 下载次数: 0

最佳答案

查看完整内容

如果直接按你.rar文件里面的after-hours-exposed-siterip.txt读取的话, 最后的for循环改一下: for value in movie_name[1:]: url_link1 = rf'https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/{value.strip()}.mp4.html' print(url_link1)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-9-18 16:35:08 | 显示全部楼层    本楼为最佳答案   
如果直接按你.rar文件里面的after-hours-exposed-siterip.txt读取的话, 最后的for循环改一下:
for value in movie_name[1:]:
    url_link1 = rf'https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/{value.strip()}.mp4.html'
    print(url_link1)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2022-9-19 12:05:29 | 显示全部楼层
本帖最后由 fdfanmo 于 2022-9-19 13:45 编辑
月下孤井 发表于 2022-9-18 19:40
如果直接按你.rar文件里面的after-hours-exposed-siterip.txt读取的话, 最后的for循环改一下:
for value i ...


谢谢帮忙回覆.
但是好像沒有真的匹配到原始码中的网址
for value in movie_name[1:]:
    #print(value)
    url_link1 = (rf'https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/{value.strip()}.mp4.html',value)
    print(url_link1)

因为那个http://网址这是会变动的
所以不可以这样写死.
只能用url_link1 = (rf'https.+{value.strip()}.mp4.html',value)
因为这个是会变动的https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/
实际上的网址会是這樣的網址才是正確的網址.
  1. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20171011_public_rooftop_blowjob_in_old_town_riga_latvia.mp4.html
  2. https://rapidgator.net/file/d341631ca5bdcd6a2a2da0250e67bba6/20200923_double_teen_blowjob_doing_makeup_then_cumblast_croatia_vacation_1.mp4.html
  3. https://rapidgator.net/file/2228be05d86c849886481330dfb5ded7/20200923_double_teen_blowjob_doing_makeup_then_cumblast_croatia_vacation_1.mp4.html
  4. https://rapidgator.net/file/35d4b3afd82d25b1bc070d9b62f188b7/20200429_pov_dildoing_and_pussy_eating_vanessa_klein_pov_misspussycat_1.mp4.html
  5. https://rapidgator.net/file/599ac1013ca0f02f266b49ea843a6332/20200429_pov_dildoing_and_pussy_eating_vanessa_klein_pov_misspussycat_1.mp4.html
  6. https://rapidgator.net/file/ee15c7c7096530d3e8d4bba67a1edc49/20200318_19yo_pretty_blonde_mia_back_for_a_nice_afternoon_blowjob.mp4.html
  7. https://rapidgator.net/file/d2c0b201eff3c4023c75dcfcb3f30d84/20200318_19yo_pretty_blonde_mia_back_for_a_nice_afternoon_blowjob.mp4.html
  8. https://rapidgator.net/file/882c39edab324fb5fe7b7cb649332623/20200115_19yo_mia_sucking_me_off_and_2_private_sex_tapes_from_her_phone.mp4.html
  9. https://rapidgator.net/file/d5be9c3e704eb4be0279f3f7bf0207ed/20200115_19yo_mia_sucking_me_off_and_2_private_sex_tapes_from_her_phone.mp4.html
  10. https://rapidgator.net/file/9b52ee6c6919b3dde75fa3d634f96594/20190619_teen_jete_first_forest_blowjob_and_mouth_cum_drool.mp4.html
  11. https://rapidgator.net/file/f4241073b85c95fcf319cdff5f5f5df3/20190619_teen_jete_first_forest_blowjob_and_mouth_cum_drool.mp4.html
  12. https://rapidgator.net/file/9696ea95a04fc3a3f529d538d9f02a2f/20190227_barely_18_alina_huge_tits_and_sucking_me_off_titty_fuck_until_mouth_cum.mp4.html
  13. https://rapidgator.net/file/8758eaee65b050c1e0f4df947f189e89/20190213_dream_night_with_my_18yo_blonde_latvian_dream_girl_one_night_on_a_cruise_ship.mp4.html
  14. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20171011_public_rooftop_blowjob_in_old_town_riga_latvia.mp4.html
  15. https://rapidgator.net/file/159cfe51e119ce49eaf0bda4ee7f4872/20210818_super_double_blowjob_miss_pussycat_and_spinner_blake.mp4.html
  16. https://rapidgator.net/file/79e4d923a5457801e9e770a4eb4d330d/20210428_pov_lesbian_miss_pussycat_ice_and_poprocks_pussy_licking.mp4.html
  17. https://rapidgator.net/file/0bbc6358a61f2d9aa96387ad64a01104/20210210_big_boobed_paula_giving_sexy_pov_blowjob.mp4.html
  18. https://rapidgator.net/file/d0af4ec4da4063c347d0ac5160fabebf/20210127_new_girl_18yo_kelly_anne_pov_pussy_licking_striptease_with_miss_pussycat.mp4.html
复制代码

但是目前的写法会变成这样实际是是没办法访问到正常的下载点因为数字都被写死了.无法用正则爬取到正确的链结
  1. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20171011_public_rooftop_blowjob_in_old_town_riga_latvia.mp4.html
  2. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20200923_double_teen_blowjob_doing_makeup_then_cumblast_croatia_vacation_1.mp4.html
  3. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20200429_pov_dildoing_and_pussy_eating_vanessa_klein_pov_misspussycat_1.mp4.html
  4. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20200318_19yo_pretty_blonde_mia_back_for_a_nice_afternoon_blowjob.mp4.html
  5. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20200115_19yo_mia_sucking_me_off_and_2_private_sex_tapes_from_her_phone.mp4.html
  6. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20190619_teen_jete_first_forest_blowjob_and_mouth_cum_drool.mp4.html
  7. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20190227_barely_18_alina_huge_tits_and_sucking_me_off_titty_fuck_until_mouth_cum.mp4.html
  8. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20190206_nervous_18yo_alina_blowjob_handjob_combo_huge_tits_cumed_and_glaz.mp4.html
  9. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20190213_dream_night_with_my_18yo_blonde_latvian_dream_girl_one_night_on_a_cruis.mp4.html
  10. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20171011_public_rooftop_blowjob_in_old_town_riga_latvia.mp4.html
  11. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20210825_real_highschool_cheerleader_nervously_gives_perfect_nice.blowjob_pov_di.mp4.html
  12. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20210818_super_double_blowjob_miss_pussycat_and_spinner_blake.mp4.html
  13. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20210428_pov_lesbian_miss_pussycat_ice_and_poprocks_pussy_licking.mp4.html
  14. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20210210_big_boobed_paula_giving_sexy_pov_blowjob.mp4.html
  15. https://rapidgator.net/file/d563e2e78556baa8f282bf97e3eb493d/20210127_new_girl_18yo_kelly_anne_pov_pussy_licking_striptease_with_miss_pussyca.mp4.html
复制代码

想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-9-19 22:31:29 | 显示全部楼层
fdfanmo 发表于 2022-9-19 12:05
谢谢帮忙回覆.
但是好像沒有真的匹配到原始码中的网址
for value in movie_name[1:]:

import urllib.request
from lxml import etree

url = "https://pornchil.com/after-hours-exposed-siterip/#more-98114"
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
}
# 創造一個response對象
request = urllib.request.Request(url=url, headers=headers)
# 訪問url
response = urllib.request.urlopen(request)
# 接收並轉碼讀取的原始碼
source_code = response.read().decode("utf-8")
url_link = etree.HTML(source_code).xpath(r'//div[@class="entry-content"]/h6[7]/a/@href')
print(url_link)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2022-9-20 15:06:41 | 显示全部楼层
本帖最后由 fdfanmo 于 2022-9-20 15:18 编辑
月下孤井 发表于 2022-9-19 22:31
import urllib.request
from lxml import etree

谢谢大大这样热心的帮忙回覆.
大大这样写是爬取所有链结
但目前难点在于不是所有链结都需要去下载
而是只要after-hours-exposed-siterip.txt中读取到的链结才需要取完整的链结网址
所以才会有这句
f=open("G:\\after-hours-exposed-siterip.txt","r")
movie_name = f.readlines()

读出来是这些内容
#after-hours-exposed-siterip.txt内读取到的内容
after-hours-exposed-siterip=[
20171011_public_rooftop_blowjob_in_old_town_riga_latvia,
20200923_double_teen_blowjob_doing_makeup_then_cumblast_croatia_vacation_1,
20200429_pov_dildoing_and_pussy_eating_vanessa_klein_pov_misspussycat_1,
20200318_19yo_pretty_blonde_mia_back_for_a_nice_afternoon_blowjob,
20200115_19yo_mia_sucking_me_off_and_2_private_sex_tapes_from_her_phone,
20190619_teen_jete_first_forest_blowjob_and_mouth_cum_drool,
20190227_barely_18_alina_huge_tits_and_sucking_me_off_titty_fuck_until_mouth_cum,
20190206_nervous_18yo_alina_blowjob_handjob_combo_huge_tits_cumed_and_glaz,
20190213_dream_night_with_my_18yo_blonde_latvian_dream_girl_one_night_on_a_cruis,
20171011_public_rooftop_blowjob_in_old_town_riga_latvia,
20210825_real_highschool_cheerleader_nervously_gives_perfect_nice.blowjob_pov_di,
20210818_super_double_blowjob_miss_pussycat_and_spinner_blake,
20210428_pov_lesbian_miss_pussycat_ice_and_poprocks_pussy_licking,
20210210_big_boobed_paula_giving_sexy_pov_blowjob,
20210127_new_girl_18yo_kelly_anne_pov_pussy_licking_striptease_with_miss_pussyca,
]

因为有这样的需求所以才需要用这则去匹配.
而且必需是搭配变数的正则才可以
因为这个正则要匹配读出来的片名
所以没办法把正则写死.
所以一直想不出来如何去克服这个问题.
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2022-9-20 17:21:00 | 显示全部楼层
fdfanmo 发表于 2022-9-20 15:06
谢谢大大这样热心的帮忙回覆.
大大这样写是爬取所有链结
但目前难点在于不是所有链结都需要去下载

我这里没法运行调试,程序访问不到国外的网址,你是用的什么方法爬取国外网站的啊,可不可以教一下我,然后我再慢慢调试你的程序
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2022-9-21 10:12:38 | 显示全部楼层
月下孤井 发表于 2022-9-20 17:21
我这里没法运行调试,程序访问不到国外的网址,你是用的什么方法爬取国外网站的啊,可不可以教一下我,然后我 ...

应该是你那边可能有挡国外ip.
另外这个服务器不是很稳定
我有时候也会访问到状态码500
这个问题我一个朋友已经帮我写出来了
我把原始码贴出来顺便做个笔记.
  1. import urllib.request
  2. import re
  3. from bs4 import BeautifulSoup
  4. url="https://pornchil.com/after-hours-exposed-siterip/#more-98114"
  5. headers = {
  6. "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
  7. }
  8. #創造一個response對象
  9. request = urllib.request.Request(url=url,headers=headers)
  10. #訪問url
  11. response = urllib.request.urlopen(request)
  12. #接收並轉碼讀取的原始碼
  13. source_code = response.read().decode("utf-8")
  14. #print(source_code)

  15. f=open("G:\\after-hours-exposed-siterip.txt","r")
  16. movie_name = f.readlines()

  17. def newStrRe(kw):
  18.     return re.sub('(-|_)','.',kw)

  19. for item in movie_name:
  20.     # print("item"+item)
  21.     # print("item.strip()"+item.strip())
  22.     sult = re.findall(rf'http.+{newStrRe(item.strip())}.mp4.html',source_code)
  23.     if sult:
  24.         print(sult[0])
复制代码
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-3-29 14:01

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表