爬虫无法定位到tbody下的tr求助!!!!
如下2段代码:A = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody')
B = response.css('.wrap_c table tbody')
返回结果:
A:data='<tbody id="tenderBulletin"></tbody>'
B: data='<tbody><tr height="19" class="firstRow" '>
问题:
A = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody tr')
B = response.css('.wrap_c table tbody tr')
B可以定位到tr , A返回空 找不到tbody下的tr元素
通过F12查看页面元素:
A:
B: 发完整代码吧 import re
import scrapy
from Spider_Object_Test.items import InsuranceItem
class CebPubService_Zb_Spider(scrapy.Spider):
name = 'dadi_cebpubservice_zb'
start_urls = ['http://www.cebpubservice.com/ctpsp_iiss/searchbusinesstypebeforedooraction/getSearch.do']
def parse(self, response):
next_page = self.start_urls
if next_page is not None:
next_full_url = response.urljoin(next_page)
yield scrapy.Request(next_full_url, callback=self.cooperate_parse)
def cooperate_parse(self, response):
item = InsuranceItem()
item['company_id'] = '600006'
item['type'] = "中国招标投标公共服务平台 -- 招标"
tp_fullname = ''
tp_name = ''
tp_start_time = ''
tp_end_time = ''
tp_platform_record = ''
tp_platform_web = ''
tp_classify = ''
dl = response.css('.TabContent .reaultList:nth-child(2) .table_Head table tbody')
print(dl) Twilight6 发表于 2020-6-3 16:37
发完整代码吧
完整的发了 叶小贤 发表于 2020-6-3 16:43
import re
import scrapy
from Spider_Object_Test.items import InsuranceItem
Spider_Object_Test这个是你自己的模块吧? Twilight6 发表于 2020-6-3 16:49
Spider_Object_Test这个是你自己的模块吧?
是的 要传数据库的字段
class InsuranceItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
type = scrapy.Field()
company_id = scrapy.Field()
record_name = scrapy.Field()
sale_name = scrapy.Field()
record_id = scrapy.Field()
classify = scrapy.Field()
items = scrapy.Field()
insurance_type = scrapy.Field()
record_type = scrapy.Field()
items_id = scrapy.Field()
platform_fullname = scrapy.Field()
platform = scrapy.Field()
start_time = scrapy.Field()
end_time = scrapy.Field()
platform_web = scrapy.Field()
platform_record = scrapy.Field()
record_time = scrapy.Field()
file_urls = scrapy.Field()
files = scrapy.Field()
file_name = scrapy.Field()
file_id = scrapy.Field()
file_update_date = scrapy.Field()
file_type = scrapy.Field() 叶小贤 发表于 2020-6-3 16:53
是的 要传数据库的字段
class InsuranceItem(scrapy.Item):
好吧 超纲了我帮助不到你了 抱歉{:10_250:} Twilight6 发表于 2020-6-3 17:13
好吧 超纲了我帮助不到你了 抱歉
{:10_266:} 没事谢谢你 叶小贤 发表于 2020-6-3 17:15
没事谢谢你
我没怎么学这种提取数据 我刚刚开始没认真看抱歉 Twilight6 发表于 2020-6-3 17:16
我没怎么学这种提取数据 我刚刚开始没认真看抱歉
没事的兄弟你是怎么爬取数据的?? 叶小贤 发表于 2020-6-3 17:22
没事的兄弟你是怎么爬取数据的??
用BeautifulSoup xiaosi4081 发表于 2020-6-3 17:42
用BeautifulSoup
能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素 {:10_266:} 叶小贤 发表于 2020-6-3 17:46
能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素
我没学过scrapy抓取,不知道你抓回来的源码在哪个变量里{:10_266:} 叶小贤 发表于 2020-6-3 17:46
能不能帮忙看一下 使用BeautifulSoup 是否能定位到我上面代码中 tbody下的 tr元素
用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本? 叶小贤 发表于 2020-6-3 16:44
完整的发了
没调用?(抱歉刚学类) Twilight6 发表于 2020-6-3 17:48
用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本?
我整了半天 总算有点明白了,tbody里的tr数据是通过一个post请求生成的 我写了一个请求发现他返回的数据是这样:
{"message":"","success":true,"object":{"returnlist":[{"guid":null,"businessId":"769e2fbcaa3847dfbfc33763729bb248","tenderProjectCode":"M4401000017202209220","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"东莞深能源樟洋电力有限公司2×180MW燃气-蒸汽联合循环发电机组2020-2022年度财产保险招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-06-03","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-10","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"d4d5f8cdaaaa467793cffe9749a26b15","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务二次招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-06-01","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-22","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"1f100e2078374a0990d628f490626bc1","tenderProjectCode":"M1400000026003201017","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"山西阳光发电有限责任公司2020-2021年度运营期保险项目招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-28","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 阳泉市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-18","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"7035427bbc1b43039bda76c5dfb63f10","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标延期公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-26","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-01","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"548f1f69b9e84ed6bed3c69d35e6565b","tenderProjectCode":"M4401000017202063206","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"2020年运营期保险服务(第二次招标)变更公告变更公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-25","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-15","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"b47baad8f11a43bab0beeb655e5c6f19","tenderProjectCode":"M3400000022002060026","tenderProjectName":null,"transactionPlatfCode":"M3400000022","businessObjectName":"阜阳卷烟材料厂2020年团体补充医疗保险项目招标公告(二次)","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-22","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"优质采电子交易平台","regionCode":null,"regionName":"安徽省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-28","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"19f26fe9b4a94c0a9ac9817e54eee509","tenderProjectCode":"M1401000155200923002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市高铁东站及道路配套设施工程PPP项目工程一切险服务招标延期公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-18","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"山西省招标投标公共服务平台","regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-09","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"5fb27aa3ee5444d0a9cdee2d1680edb6","tenderProjectCode":"M4401000017201710171","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"明阳阳江沙扒300MW科研示范项目建筑安装工程一切险变更公告变更公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-15","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-29","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"2c0e658ede394d50ad029a89c1b65927","tenderProjectCode":"M4401000017201617161","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"2020年运营期保险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-14","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-29","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"e622bebd46bd4e1e8feacaf719a77de2","tenderProjectCode":"M4400000020014339001","tenderProjectName":null,"transactionPlatfCode":"M4400000020","businessObjectName":"佛山市佛铁实业有限公司2020-2021年度汽车保险服务项目招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-13","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"广东电子招标平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-04","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"258b5286ccc6479390a7778e2e098f9e","tenderProjectCode":"M3400000022006718009","tenderProjectName":null,"transactionPlatfCode":"M3400000022","businessObjectName":"安徽皖南烟叶有限责任公司2020-2022年度保险服务供应商采购项目招标公告【二次】","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-13","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"优质采电子交易平台","regionCode":null,"regionName":"安徽省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-19","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"9fa9c042f4af4020add1ddd138c2ca21","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标控制价","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-11","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-27","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"e788e72e6bc54d399adf6e9608b72f36","tenderProjectCode":"M1401000155200923002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市高铁东站及道路配套设施工程PPP项目工程一切险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-11","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-06-02","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"4d38ab21daa44999870f2512ab3cb154","tenderProjectCode":"M4401000017201710171","tenderProjectName":null,"transactionPlatfCode":"M4401000017","businessObjectName":"明阳阳江沙扒300MW科研示范项目建筑安装工程一切险招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-08","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":"国义招标采购平台","regionCode":null,"regionName":"广东省","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-15","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null},{"guid":null,"businessId":"d1affa613f1b4326b1540bac7da17d62","tenderProjectCode":"M1401000155200694002","tenderProjectName":null,"transactionPlatfCode":"1401000004P","businessObjectName":"长治市长北干线快速通道建设工程PPP项目(长治市长北干线改扩建工程)工程一切保险服务招标公告","largeType":null,"largeNum":null,"smallType":null,"timeStamp":null,"receiveTime":"2020-05-06","auditStatus":null,"status":null,"tendererCode":null,"tenderAgencyCode":null,"transactionPlatfName":null,"regionCode":null,"regionName":"山西省 长治市","tenderName":null,"tenderAgencyName":null,"industriesType":"保险","industriesCode":null,"bulletinEndTime":"2020-05-27","schemaVersion":"V60.02","platformType":null,"type":"0","rowGuid":null}],"page":{"pageNo":1,"totalPage":3,"totalCount":36,"row":15,"rowNo":0,"pn":0}}}
Twilight6 发表于 2020-6-3 17:48
用其他的可以帮你搞搞 你要提取的数据是整个节点 还是节点里的文本?
然后提取 里面的某几个数据 还得再写一个POST请求 {:10_266:} 你的A有问题,定位错了
对于css选择器不太熟悉,少用伪类选择器,优先使用ID选择器 叶小贤 发表于 2020-6-3 19:16
然后提取 里面的某几个数据 还得再写一个POST请求
好吧 嘿嘿 加油 辛苦了下次我也学点css
页:
[1]