做爬虫的一些问题
url如下:https://ned.ipac.caltech.edu/byname?objname=PKS%200002-478&hconst=67.8&omegam=0.308&omegav=0.692&wmap=4&corr_z=1
这个网页的数据画红圈的数据要怎么爬下来? {:5_95:} https://ned.ipac.caltech.edu/ffs/sticky/CmdSrv F12找我上面发的那个链接 from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('https://ned.ipac.caltech.edu/byname?objname=PKS%200002-478&hconst=67.8&omegam=0.308&omegav=0.692&wmap=4&corr_z=1')
time.sleep(5)
driver.find_element_by_id('ui-id-10').click()#切换到Photometry & SED (47)选项卡
time.sleep(5)
tab1 = driver.find_elements_by_xpath('//div[@class="fixedDataTableLayout_rowsContainer"]')
#因为该选项卡下有两个表格,所以用指定为第一个表格
cells = tab1.find_elements_by_class_name('public_fixedDataTableCell_cellContent')
#表格单元class均为public_fixedDataTableCell_cellContent
txt =
print(txt) 南归 发表于 2021-5-15 10:18
F12找我上面发的那个链接
不是很明白{:5_104:} 自己多分析F12咋用吧.... 南归 发表于 2021-5-15 18:56
自己多分析F12咋用吧....
这种属于ajax编码的网页吗? https://www.hualigs.cn/image/609fad3a06eca.jpg
先打开网页,等待加载完毕后,先清空抓包记录,再点击Photometry & SED (47),搜索Gamma-Ray,得到如图所示的界面 楼上兄弟已经给你找到数据接口了,你直接携带参数请求数据接口就能拿到数据。
import requests
url = 'https://ned.ipac.caltech.edu/ffs/sticky/CmdSrv'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
}
data = {
'request': '{"startIdx":0,"pageSize":1000,"ffSessionId":"FF-Session-1621083370564","filters":"","source":"http://ned.ipac.caltech.edu/cgi-bin/objsearch?extend=no&out_csys=Equatorial&out_equinox=J2000.0&obj_sort=RA+or+Longitude&of=xml_qlphot&zv_breaker=30000.0&list_limit=5&img_stamp=YES&objname=PKS+0002-478&hconst=67.8&omegam=0.308&omegav=0.692&wmap=4&corr_z=1&objid=76355","alt_source":"http://ned.ipac.caltech.edu/cgi-bin/objsearch?extend=no&out_csys=Equatorial&out_equinox=J2000.0&obj_sort=RA+or+Longitude&of=xml_qlphot&zv_breaker=30000.0&list_limit=5&img_stamp=YES&objname=PKS+0002-478&hconst=67.8&omegam=0.308&omegav=0.692&wmap=4&corr_z=1&objid=76355","META_INFO":{"title":"qlphot","tbl_id":"tbl_id-c50b9-6","col.Refcode.PrefWidth":"20","col.Spectral Region.PrefWidth":"14","col.Band.PrefWidth":"17","col.Apparent Mag or Flux.PrefWidth":"16","col.Reference code.PrefWidth":"12","selectInfo":"false--0"},"tbl_id":"tbl_id-c50b9-6","id":"IpacTableFromSource"}',
'cmd': 'tableSearch'
}
res = requests.post(url, headers=headers, data=data)
print(res.json()) YunGuo 发表于 2021-5-15 21:06
楼上兄弟已经给你找到数据接口了,你直接携带参数请求数据接口就能拿到数据。
当我改变搜索的目标时,data的信息要自动获取? snowJR 发表于 2021-5-16 10:12
当我改变搜索的目标时,data的信息要自动获取?
改一下data查询参数就行,你举个例看看,你要搜索的其他内容,我分析分析查询参数。 YunGuo 发表于 2021-5-16 19:19
改一下data查询参数就行,你举个例看看,你要搜索的其他内容,我分析分析查询参数。
比如说我现在要查询 0106+013这一个的信息 YunGuo 发表于 2021-5-16 19:19
改一下data查询参数就行,你举个例看看,你要搜索的其他内容,我分析分析查询参数。
原来的搜索的url是这一个
https://ned.ipac.caltech.edu/ snowJR 发表于 2021-5-17 07:30
原来的搜索的url是这一个
https://ned.ipac.caltech.edu/
import requests
import re
import time
from urllib import parse
def parser(datas):
for data in datas:
print(data)
def get_data(objid, keyword):
url = 'https://ned.ipac.caltech.edu/ffs/sticky/CmdSrv'
ff = str(int(time.time() * 1000))
data = {
'request': '{"startIdx":0,"pageSize":1000,"ffSessionId":"FF-Session-'+ff+'","filters":"","source":"http://ned.ipac.caltech.edu/cgi-bin/objsearch?extend=no&out_csys=Equatorial&out_equinox=J2000.0&obj_sort=RA+or+Longitude&of=xml_qlphot&zv_breaker=30000.0&list_limit=5&img_stamp=YES&objname='+keyword+'&hconst=67.8&omegam=0.308&omegav=0.692&wmap=4&corr_z=1&objid='+objid+'","alt_source":"http://ned.ipac.caltech.edu/cgi-bin/objsearch?extend=no&out_csys=Equatorial&out_equinox=J2000.0&obj_sort=RA+or+Longitude&of=xml_qlphot&zv_breaker=30000.0&list_limit=5&img_stamp=YES&objname='+keyword+'&hconst=67.8&omegam=0.308&omegav=0.692&wmap=4&corr_z=1&objid='+objid+'","META_INFO":{"title":"qlphot","tbl_id":"tbl_id-c50b9-6","col.Refcode.PrefWidth":"20","col.Spectral Region.PrefWidth":"14","col.Band.PrefWidth":"17","col.Apparent Mag or Flux.PrefWidth":"16","col.Reference code.PrefWidth":"12","selectInfo":"false--0"},"tbl_id":"tbl_id-c50b9-6","id":"IpacTableFromSource"}',
'cmd': 'tableSearch'
}
res = requests.post(url, headers=headers, data=data)
return res.json()['tableData']['data']
def get_objid(keyword):
url = f'https://ned.ipac.caltech.edu/byname?objname={keyword}&hconst=67.8&omegam=0.308&omegav=0.692&wmap=4&corr_z=1'
res = requests.get(url, headers=headers)
objid = re.findall('objid=(.*?)"', res.text)
datas = get_data(objid, keyword)
parser(datas)
if __name__ == '__main__':
word = input('输入关键词:')
key_word = parse.quote(word)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
}
get_objid(key_word) YunGuo 发表于 2021-5-17 21:19
太感谢了!!
页:
[1]