鱼C论坛

 找回密码
 立即注册
查看: 1058|回复: 4

[已解决]求助大神xpath取值返回值为空的问题

[复制链接]
发表于 2020-3-8 20:31:27 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
本人是新手,刚学习python爬虫,想试试水,然后就想试试爬取好看视频,由于是动态网站,在nrtwork里面用正则爬取到了视频链接,并且下载下来,后来发现爬取的并不是超清的链接,后来就换了一个思路,先从门户网站提取单个视频的链接,在依次请求单个视频网站,在用xpath提取html中的超清链接,可是就是无法返回正确的值。所以想问下大神,这个问题怎么解决,再次声明,我想爬取的是超清的链接,而不是标清的直接用正则匹配出来的!
下面是我的代码,新手刚学,代码写的很乱,见谅

import requests
import json
from lxml import etree


url ="https://haokan.baidu.com/videoui/api/videorec?tab=yingshi&act=pcFeed&pd=pc&num=5&shuaxin_id=1583468488484"

headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
        "cookie": "BIDUPSID=46B223B8A7EAECCA159168C5EF538730; PSTM=1580279175; BAIDUID=46B223B8A7EAECCA1D299D6661AB1F78:FG=1; BDUSS=wyaHJzcUFsOWdtYkVMQ0FneWFCfjVUOTcyTXZMRHJmUWJtbWlrOW1abW1zbGhlSVFBQUFBJCQAAAAAAAAAAAEAAAC9Y9sXYTE1Mzg4NjU0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAKYlMV6mJTFeY0; Hm_lvt_4aadd610dfd2f5972f1efee2653a2bc5=1581861785,1581914947,1582809107; COMMON_LID=8d29357345ecabe02bcad89b3bd55ab6; reptileData=%7B%22data%22%3A%227e946299daa801da748c32b90adc9d7587f263b748089c6f531c52f9288451033df59d8bcd2c41c5ea90515a8a59833e572833311e74af94f7e2aa1794a63686198f2e47c079610fbaf9740df8731397ca5a287539edecd7194117534d906ad3d67750c2ddd9d756c1352c874ec21387f39c3f33f49f4f9d63f2d0f530ed45645ab8b1fe1841446ca7f3e2bdc2badd48%22%2C%22key_id%22%3A%2230%22%2C%22sign%22%3A%22b3439eb1%22%7D; PC_TAB_LOG=haokan_website_page; Hm_lpvt_4aadd610dfd2f5972f1efee2653a2bc5=1582809193"
}
response = requests.get(url,headers=headers)
data = response.text
#print(data)
json_data = json.loads(data)
#print(json_data)
date_list = json_data['data']['response']['videos']
#print(date_list)
urls = []
for date in date_list:
    video_urls = date['url']
    #print(video_urls)
    urls.append(date['url'])
#print(urls)

for url_video in urls:
    responses = requests.get(url_video,headers=headers)
    data_res = responses.text
    html_ele = etree.HTML(data_res)
    url_video = html_ele.xpath('/html/body/div/div/div[1]/div[1]/div/hk-controls/hk-definition/ul/li[3]')
    print(url_video)
最佳答案
2020-3-9 06:26:13
本帖最后由 wp231957 于 2020-3-9 08:55 编辑

https://fishc.com.cn/forum.php?m ... &extra=page%3D4
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2020-3-8 20:39:13 | 显示全部楼层
import requests
import json
from lxml import etree


url ="https://haokan.baidu.com/videoui/api/videorec?tab=yingshi&act=pcFeed&pd=pc&num=5&shuaxin_id=1583468488484"

headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
        "cookie": "BIDUPSID=46B223B8A7EAECCA159168C5EF538730; PSTM=1580279175; BAIDUID=46B223B8A7EAECCA1D299D6661AB1F78:FG=1; BDUSS=wyaHJzcUFsOWdtYkVMQ0FneWFCfjVUOTcyTXZMRHJmUWJtbWlrOW1abW1zbGhlSVFBQUFBJCQAAAAAAAAAAAEAAAC9Y9sXYTE1Mzg4NjU0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAKYlMV6mJTFeY0; Hm_lvt_4aadd610dfd2f5972f1efee2653a2bc5=1581861785,1581914947,1582809107; COMMON_LID=8d29357345ecabe02bcad89b3bd55ab6; reptileData=%7B%22data%22%3A%227e946299daa801da748c32b90adc9d7587f263b748089c6f531c52f9288451033df59d8bcd2c41c5ea90515a8a59833e572833311e74af94f7e2aa1794a63686198f2e47c079610fbaf9740df8731397ca5a287539edecd7194117534d906ad3d67750c2ddd9d756c1352c874ec21387f39c3f33f49f4f9d63f2d0f530ed45645ab8b1fe1841446ca7f3e2bdc2badd48%22%2C%22key_id%22%3A%2230%22%2C%22sign%22%3A%22b3439eb1%22%7D; PC_TAB_LOG=haokan_website_page; Hm_lpvt_4aadd610dfd2f5972f1efee2653a2bc5=1582809193"
}
response = requests.get(url,headers=headers)
data = response.text
#print(data)
json_data = json.loads(data)
#print(json_data)
date_list = json_data['data']['response']['videos']
#print(date_list)
urls = []
for date in date_list:
    video_urls = date['url']
    #print(video_urls)
    urls.append(date['url'])
#print(urls)

for url_video in urls:
    responses = requests.get(url_video,headers=headers)
    data_res = responses.text
    html_ele = etree.HTML(data_res)
    url_video = html_ele.xpath('/html/body/div/div/div[1]/div[1]/div/hk-controls/hk-definition/ul/li[3]')
    print(url_video)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-3-8 20:52:50 From FishC Mobile | 显示全部楼层
用selenium
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-3-9 06:26:13 From FishC Mobile | 显示全部楼层    本楼为最佳答案   
本帖最后由 wp231957 于 2020-3-9 08:55 编辑

https://fishc.com.cn/forum.php?m ... &extra=page%3D4
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-3-9 15:27:05 | 显示全部楼层
wp231957 发表于 2020-3-9 06:26
https://fishc.com.cn/forum.php?mod=viewthread&tid=159724&extra=page%3D4

感谢给思路
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-11-24 14:50

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表