鱼C论坛

 找回密码
 立即注册
查看: 2376|回复: 18

爬虫无法定位到tbody下的tr求助!!!!

[复制链接]
发表于 2020-6-3 16:35:46 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
如下2段代码:
A = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody')
B = response.css('.wrap_c table tbody')

返回结果:
A:  data='<tbody id="tenderBulletin"></tbody>'
B: data='<tbody><tr height="19" class="firstRow" '>

问题:
A = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody tr')
B = response.css('.wrap_c table tbody tr')

B可以定位到tr , A返回空 找不到tbody下的tr元素

通过F12查看页面元素:

A:

A

A

B:

B

B
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2020-6-3 16:37:38 | 显示全部楼层
发完整代码吧
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-6-3 16:43:21 | 显示全部楼层
import re
import scrapy
from Spider_Object_Test.items import InsuranceItem


class CebPubService_Zb_Spider(scrapy.Spider):
    name = 'dadi_cebpubservice_zb'
    start_urls = ['http://www.cebpubservice.com/ctpsp_iiss/searchbusinesstypebeforedooraction/getSearch.do']



    def parse(self, response):
        next_page = self.start_urls[0]
        if next_page is not None:
            next_full_url = response.urljoin(next_page)
            yield scrapy.Request(next_full_url, callback=self.cooperate_parse)

    def cooperate_parse(self, response):
        item = InsuranceItem()
        item['company_id'] = '600006'
        item['type'] = "中国招标投标公共服务平台 -- 招标"
        tp_fullname = ''
        tp_name = ''
        tp_start_time = ''
        tp_end_time = ''
        tp_platform_record = ''
        tp_platform_web = ''
        tp_classify = ''
        dl = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody')
        print(dl)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-6-3 16:44:16 | 显示全部楼层

完整的发了
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 16:49:01 | 显示全部楼层
叶小贤 发表于 2020-6-3 16:43
import re
import scrapy
from Spider_Object_Test.items import InsuranceItem

Spider_Object_Test  这个是你自己的模块吧?
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-6-3 16:53:15 | 显示全部楼层
Twilight6 发表于 2020-6-3 16:49
Spider_Object_Test  这个是你自己的模块吧?




是的 要传数据库的字段

class InsuranceItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    type = scrapy.Field()
    company_id = scrapy.Field()
    record_name = scrapy.Field()
    sale_name = scrapy.Field()
    record_id = scrapy.Field()
    classify = scrapy.Field()
    items = scrapy.Field()
    insurance_type = scrapy.Field()
    record_type = scrapy.Field()
    items_id = scrapy.Field()
    platform_fullname = scrapy.Field()
    platform = scrapy.Field()
    start_time = scrapy.Field()
    end_time = scrapy.Field()
    platform_web = scrapy.Field()
    platform_record = scrapy.Field()
    record_time = scrapy.Field()

    file_urls = scrapy.Field()
    files = scrapy.Field()
    file_name = scrapy.Field()
    file_id = scrapy.Field()
    file_update_date = scrapy.Field()
    file_type = scrapy.Field()
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 17:13:26 | 显示全部楼层
叶小贤 发表于 2020-6-3 16:53
是的 要传数据库的字段

class InsuranceItem(scrapy.Item):

好吧 超纲了  我帮助不到你了 抱歉
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-6-3 17:15:33 | 显示全部楼层
Twilight6 发表于 2020-6-3 17:13
好吧 超纲了  我帮助不到你了 抱歉

没事谢谢你
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 17:16:21 | 显示全部楼层

我没怎么学这种提取数据 我刚刚开始没认真看  抱歉
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-6-3 17:22:15 | 显示全部楼层
Twilight6 发表于 2020-6-3 17:16
我没怎么学这种提取数据 我刚刚开始没认真看  抱歉

没事的兄弟  你是怎么爬取数据的??
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 17:42:23 | 显示全部楼层
叶小贤 发表于 2020-6-3 17:22
没事的兄弟  你是怎么爬取数据的??

用BeautifulSoup
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-6-3 17:46:42 | 显示全部楼层

能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 17:48:12 | 显示全部楼层
叶小贤 发表于 2020-6-3 17:46
能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素


我没学过scrapy抓取,不知道你抓回来的源码在哪个变量里
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 17:48:54 | 显示全部楼层
叶小贤 发表于 2020-6-3 17:46
能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素

用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本?
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 17:51:21 | 显示全部楼层

没调用?(抱歉刚学类)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-6-3 19:15:14 | 显示全部楼层
Twilight6 发表于 2020-6-3 17:48
用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本?


我整了半天 总算有点明白了,tbody里的tr数据是通过一个post请求生成的 我写了一个请求发现他返回的数据是这样:
{"message":"","success":true,"object":{"returnlist":[{"guid":null,"businessId":"769e2fbcaa3847dfbfc33763729bb248","tenderProjectCode":"M4401000017202209220","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"东莞深能源樟洋电力有限公司2×180MW燃气-蒸汽联合循环发电机组2020-2022年度财产保险招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-06-03","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-10","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"d4d5f8cdaaaa467793cffe9749a26b15","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务二次招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-06-01","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-22","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"1f100e2078374a0990d628f490626bc1","tenderProjectCode":"M1400000026003201017","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"山西阳光发电有限责任公司2020-2021年度运营期保险项目招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-28","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 阳泉市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-18","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"7035427bbc1b43039bda76c5dfb63f10","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标延期公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-26","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-01","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"548f1f69b9e84ed6bed3c69d35e6565b","tenderProjectCode":"M4401000017202063206","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"2020年运营期保险服务(第二次招标)变更公告变更公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-25","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-15","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"b47baad8f11a43bab0beeb655e5c6f19","tenderProjectCode":"M3400000022002060026","tenderProjectName":null,"transactionPlatfCode":"M3400000022","businessObjectName":"阜阳卷烟材料厂2020年团体补充医疗保险项目招标公告(二次)","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-22","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"优质采电子交易平台","regionCode":null,"regionName":"安徽省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-28","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"19f26fe9b4a94c0a9ac9817e54eee509","tenderProjectCode":"M1401000155200923002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市高铁东站及道路配套设施工程PPP项目工程一切险服务招标延期公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-18","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-09","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"5fb27aa3ee5444d0a9cdee2d1680edb6","tenderProjectCode":"M4401000017201710171","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"明阳阳江沙扒300MW科研示范项目建筑安装工程一切险变更公告变更公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-15","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-29","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"2c0e658ede394d50ad029a89c1b65927","tenderProjectCode":"M4401000017201617161","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"2020年运营期保险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-14","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-29","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"e622bebd46bd4e1e8feacaf719a77de2","tenderProjectCode":"M4400000020014339001","tenderProjectName":null,"transactionPlatfCode":"M4400000020","businessObjectName":"佛山市佛铁实业有限公司2020-2021年度汽车保险服务项目招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-13","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"广东电子招标平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-04","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"258b5286ccc6479390a7778e2e098f9e","tenderProjectCode":"M3400000022006718009","tenderProjectName":null,"transactionPlatfCode":"M3400000022","businessObjectName":"安徽皖南烟叶有限责任公司2020-2022年度保险服务供应商采购项目[02包]招标公告【二次】","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-13","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"优质采电子交易平台","regionCode":null,"regionName":"安徽省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-19","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"9fa9c042f4af4020add1ddd138c2ca21","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标控制价","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-11","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-27","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"e788e72e6bc54d399adf6e9608b72f36","tenderProjectCode":"M1401000155200923002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市高铁东站及道路配套设施工程PPP项目工程一切险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-11","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-02","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"4d38ab21daa44999870f2512ab3cb154","tenderProjectCode":"M4401000017201710171","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"明阳阳江沙扒300MW科研示范项目建筑安装工程一切险招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-08","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-15","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"d1affa613f1b4326b1540bac7da17d62","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-06","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-27","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null}],"page":{"pageNo":1,"totalPage":3,"totalCount":36,"row":15,"rowNo":0,"pn":0}}}
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-6-3 19:16:55 | 显示全部楼层
Twilight6 发表于 2020-6-3 17:48
用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本?

然后提取 里面的某几个数据 还得再写一个POST请求
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 20:03:39 | 显示全部楼层
你的A有问题,定位错了
对于css选择器不太熟悉,少用伪类选择器,优先使用ID选择器
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-6-3 20:06:50 | 显示全部楼层
叶小贤 发表于 2020-6-3 19:16
然后提取 里面的某几个数据 还得再写一个POST请求

好吧 嘿嘿 加油 辛苦了  下次我也学点css
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-4-28 15:10

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表