鱼cpython学习者 发表于 2020-9-16 22:31:26

爬取cookies

要爬取一个视频网站,其中响应标头里有个set_cookies的东西,请问这个有什么用吗?要怎么爬取这个set_cookies里的东西来设置cookies?

弱弱的佳佳 发表于 2020-9-17 09:04:54

Set-Cookie响应头是服务器返回的响应头用来在浏览器种cookie,一旦被种下,当浏览器访问符合条件的url地址时,会自动带上这个cookie

鱼cpython学习者 发表于 2020-9-17 12:05:33

弱弱的佳佳 发表于 2020-9-17 09:04
Set-Cookie响应头是服务器返回的响应头用来在浏览器种cookie,一旦被种下,当浏览器访问符合条件的url地址 ...

那要怎么知道什么是符合条件的url呢

suchocolate 发表于 2020-9-17 12:20:51

set-cookie: server 让浏览器存着,下次访问带上。案例如下:
# 1)urllib:MozillaCookieJar保存
import http.cookiejar
import urllib.request

filename = 'cookies.txt'
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)


# --------------------------------------------------------------------------------
# 2)urllib:LWPCookieJar保存和读取
import http.cookiejar
import urllib.request

# 保存
filename = 'cookies.txt'
cookie = http.cookiejar.LWPCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)

# 读取
cookie = http.cookiejar.LWPCookieJar()
cookie.load('cookies.txt', ignore_discard=True, ignore-expires=True)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
print(r.read().decode('utf-i'))


# --------------------------------------------------------------------------------
# 3)requets:保存和读取
# 保存
import requests

r = requests.get('https://www.baidu.com')
with open('cookie.txt', 'w') as f:
    for k, v in r.cookies.items():
      print(k,'=',v)
      f.write(k + '#' + v)



# 从文本读取
import requests
from requests.cookies import RequestsCookieJar

jar = RequestsCookieJar()
with open('cookie.txt','r') as f:
    for item in f.readlines():
      k, v = item.split('#')
      jar.set(k, v)
r = requests.get('https://www.baidu.com', cookies=jar)
print(r.status_code)


# --------------------------------------------------------------------------------
# 4)selenium:保存读取
# 保存为pickle
driver.get(url)
time.sleep(10)
pickle.dump(driver.get_cookies(), open("cookies.pkl", "wb"))

# 保存为文本
with open('cookie.txt', 'w') as f:
    for item in driver.get_cookies():
      data = item['name'] + '#' + item['value']
      f.write(data)


# --------------------------------------------------------------------------------

# 从pickle读取
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
    driver.add_cookie(cookie)
driver.get(url)

# 从文本读取参考requests读取
页: [1]
查看完整版本: 爬取cookies