叶小贤 发表于 2020-6-3 16:35:46

爬虫无法定位到tbody下的tr求助!!!!

如下2段代码:
A = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody')
B = response.css('.wrap_c table tbody')

返回结果:
A:data='<tbody id="tenderBulletin"></tbody>'
B: data='<tbody><tr height="19" class="firstRow" '>

问题:
A = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody tr')
B = response.css('.wrap_c table tbody tr')

B可以定位到tr , A返回空 找不到tbody下的tr元素

通过F12查看页面元素:

A:
B:

Twilight6 发表于 2020-6-3 16:37:38

发完整代码吧

叶小贤 发表于 2020-6-3 16:43:21

import re
import scrapy
from Spider_Object_Test.items import InsuranceItem


class CebPubService_Zb_Spider(scrapy.Spider):
    name = 'dadi_cebpubservice_zb'
    start_urls = ['http://www.cebpubservice.com/ctpsp_iiss/searchbusinesstypebeforedooraction/getSearch.do']



    def parse(self, response):
      next_page = self.start_urls
      if next_page is not None:
            next_full_url = response.urljoin(next_page)
            yield scrapy.Request(next_full_url, callback=self.cooperate_parse)

    def cooperate_parse(self, response):
      item = InsuranceItem()
      item['company_id'] = '600006'
      item['type'] = "中国招标投标公共服务平台 -- 招标"
      tp_fullname = ''
      tp_name = ''
      tp_start_time = ''
      tp_end_time = ''
      tp_platform_record = ''
      tp_platform_web = ''
      tp_classify = ''
      dl = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody')
      print(dl)

叶小贤 发表于 2020-6-3 16:44:16

Twilight6 发表于 2020-6-3 16:37
发完整代码吧

完整的发了

Twilight6 发表于 2020-6-3 16:49:01

叶小贤 发表于 2020-6-3 16:43
import re
import scrapy
from Spider_Object_Test.items import InsuranceItem


Spider_Object_Test这个是你自己的模块吧?

叶小贤 发表于 2020-6-3 16:53:15

Twilight6 发表于 2020-6-3 16:49
Spider_Object_Test这个是你自己的模块吧?



是的 要传数据库的字段

class InsuranceItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    type = scrapy.Field()
    company_id = scrapy.Field()
    record_name = scrapy.Field()
    sale_name = scrapy.Field()
    record_id = scrapy.Field()
    classify = scrapy.Field()
    items = scrapy.Field()
    insurance_type = scrapy.Field()
    record_type = scrapy.Field()
    items_id = scrapy.Field()
    platform_fullname = scrapy.Field()
    platform = scrapy.Field()
    start_time = scrapy.Field()
    end_time = scrapy.Field()
    platform_web = scrapy.Field()
    platform_record = scrapy.Field()
    record_time = scrapy.Field()

    file_urls = scrapy.Field()
    files = scrapy.Field()
    file_name = scrapy.Field()
    file_id = scrapy.Field()
    file_update_date = scrapy.Field()
    file_type = scrapy.Field()

Twilight6 发表于 2020-6-3 17:13:26

叶小贤 发表于 2020-6-3 16:53
是的 要传数据库的字段

class InsuranceItem(scrapy.Item):


好吧 超纲了我帮助不到你了 抱歉{:10_250:}

叶小贤 发表于 2020-6-3 17:15:33

Twilight6 发表于 2020-6-3 17:13
好吧 超纲了我帮助不到你了 抱歉

{:10_266:} 没事谢谢你

Twilight6 发表于 2020-6-3 17:16:21

叶小贤 发表于 2020-6-3 17:15
没事谢谢你

我没怎么学这种提取数据 我刚刚开始没认真看抱歉

叶小贤 发表于 2020-6-3 17:22:15

Twilight6 发表于 2020-6-3 17:16
我没怎么学这种提取数据 我刚刚开始没认真看抱歉

没事的兄弟你是怎么爬取数据的??

xiaosi4081 发表于 2020-6-3 17:42:23

叶小贤 发表于 2020-6-3 17:22
没事的兄弟你是怎么爬取数据的??

用BeautifulSoup

叶小贤 发表于 2020-6-3 17:46:42

xiaosi4081 发表于 2020-6-3 17:42
用BeautifulSoup

能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素 {:10_266:}

xiaosi4081 发表于 2020-6-3 17:48:12

叶小贤 发表于 2020-6-3 17:46
能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素

我没学过scrapy抓取,不知道你抓回来的源码在哪个变量里{:10_266:}

Twilight6 发表于 2020-6-3 17:48:54

叶小贤 发表于 2020-6-3 17:46
能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素

用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本?

xiaosi4081 发表于 2020-6-3 17:51:21

叶小贤 发表于 2020-6-3 16:44
完整的发了

没调用?(抱歉刚学类)

叶小贤 发表于 2020-6-3 19:15:14

Twilight6 发表于 2020-6-3 17:48
用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本?

我整了半天 总算有点明白了,tbody里的tr数据是通过一个post请求生成的 我写了一个请求发现他返回的数据是这样:
{"message":"","success":true,"object":{"returnlist":[{"guid":null,"businessId":"769e2fbcaa3847dfbfc33763729bb248","tenderProjectCode":"M4401000017202209220","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"东莞深能源樟洋电力有限公司2×180MW燃气-蒸汽联合循环发电机组2020-2022年度财产保险招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-06-03","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-10","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"d4d5f8cdaaaa467793cffe9749a26b15","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务二次招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-06-01","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-22","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"1f100e2078374a0990d628f490626bc1","tenderProjectCode":"M1400000026003201017","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"山西阳光发电有限责任公司2020-2021年度运营期保险项目招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-28","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 阳泉市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-18","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"7035427bbc1b43039bda76c5dfb63f10","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标延期公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-26","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-01","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"548f1f69b9e84ed6bed3c69d35e6565b","tenderProjectCode":"M4401000017202063206","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"2020年运营期保险服务(第二次招标)变更公告变更公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-25","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-15","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"b47baad8f11a43bab0beeb655e5c6f19","tenderProjectCode":"M3400000022002060026","tenderProjectName":null,"transactionPlatfCode":"M3400000022","businessObjectName":"阜阳卷烟材料厂2020年团体补充医疗保险项目招标公告(二次)","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-22","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"优质采电子交易平台","regionCode":null,"regionName":"安徽省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-28","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"19f26fe9b4a94c0a9ac9817e54eee509","tenderProjectCode":"M1401000155200923002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市高铁东站及道路配套设施工程PPP项目工程一切险服务招标延期公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-18","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-09","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"5fb27aa3ee5444d0a9cdee2d1680edb6","tenderProjectCode":"M4401000017201710171","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"明阳阳江沙扒300MW科研示范项目建筑安装工程一切险变更公告变更公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-15","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-29","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"2c0e658ede394d50ad029a89c1b65927","tenderProjectCode":"M4401000017201617161","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"2020年运营期保险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-14","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-29","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"e622bebd46bd4e1e8feacaf719a77de2","tenderProjectCode":"M4400000020014339001","tenderProjectName":null,"transactionPlatfCode":"M4400000020","businessObjectName":"佛山市佛铁实业有限公司2020-2021年度汽车保险服务项目招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-13","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"广东电子招标平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-04","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"258b5286ccc6479390a7778e2e098f9e","tenderProjectCode":"M3400000022006718009","tenderProjectName":null,"transactionPlatfCode":"M3400000022","businessObjectName":"安徽皖南烟叶有限责任公司2020-2022年度保险服务供应商采购项目招标公告【二次】","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-13","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"优质采电子交易平台","regionCode":null,"regionName":"安徽省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-19","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"9fa9c042f4af4020add1ddd138c2ca21","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标控制价","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-11","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-27","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"e788e72e6bc54d399adf6e9608b72f36","tenderProjectCode":"M1401000155200923002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市高铁东站及道路配套设施工程PPP项目工程一切险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-11","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-02","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"4d38ab21daa44999870f2512ab3cb154","tenderProjectCode":"M4401000017201710171","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"明阳阳江沙扒300MW科研示范项目建筑安装工程一切险招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-08","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-15","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"d1affa613f1b4326b1540bac7da17d62","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-06","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-27","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null}],"page":{"pageNo":1,"totalPage":3,"totalCount":36,"row":15,"rowNo":0,"pn":0}}}

叶小贤 发表于 2020-6-3 19:16:55

Twilight6 发表于 2020-6-3 17:48
用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本?

然后提取 里面的某几个数据 还得再写一个POST请求 {:10_266:}

_谪仙 发表于 2020-6-3 20:03:39

你的A有问题,定位错了
对于css选择器不太熟悉,少用伪类选择器,优先使用ID选择器

Twilight6 发表于 2020-6-3 20:06:50

叶小贤 发表于 2020-6-3 19:16
然后提取 里面的某几个数据 还得再写一个POST请求

好吧 嘿嘿 加油 辛苦了下次我也学点css
页: [1]
查看完整版本: 爬虫无法定位到tbody下的tr求助!!!!