鱼C论坛

 找回密码
 立即注册
查看: 1059|回复: 3

[已解决]爬取cookies

[复制链接]
发表于 2020-9-16 22:31:26 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
要爬取一个视频网站,其中响应标头里有个set_cookies的东西,请问这个有什么用吗?要怎么爬取这个set_cookies里的东西来设置cookies?
最佳答案
2020-9-17 12:20:51
set-cookie: server 让浏览器存着,下次访问带上。案例如下:
# 1)urllib:MozillaCookieJar保存
import http.cookiejar
import urllib.request

filename = 'cookies.txt'
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)


# --------------------------------------------------------------------------------
# 2)urllib:LWPCookieJar保存和读取
import http.cookiejar
import urllib.request

# 保存
filename = 'cookies.txt'
cookie = http.cookiejar.LWPCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)

# 读取
cookie = http.cookiejar.LWPCookieJar()
cookie.load('cookies.txt', ignore_discard=True, ignore-expires=True)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
print(r.read().decode('utf-i'))


# --------------------------------------------------------------------------------
# 3)requets:保存和读取
# 保存
import requests

r = requests.get('https://www.baidu.com')
with open('cookie.txt', 'w') as f:
    for k, v in r.cookies.items():
        print(k,'=',v)
        f.write(k + '#' + v)



# 从文本读取
import requests
from requests.cookies import RequestsCookieJar

jar = RequestsCookieJar()
with open('cookie.txt','r') as f:
    for item in f.readlines():
        k, v = item.split('#')
        jar.set(k, v)
r = requests.get('https://www.baidu.com', cookies=jar)
print(r.status_code)


# --------------------------------------------------------------------------------
# 4)selenium:保存读取
# 保存为pickle
driver.get(url)
time.sleep(10)
pickle.dump(driver.get_cookies(), open("cookies.pkl", "wb"))

# 保存为文本
with open('cookie.txt', 'w') as f:
    for item in driver.get_cookies():
        data = item['name'] + '#' + item['value']
        f.write(data)


# --------------------------------------------------------------------------------

# 从pickle读取
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
    driver.add_cookie(cookie)
driver.get(url)

# 从文本读取参考requests读取
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2020-9-17 09:04:54 | 显示全部楼层
Set-Cookie响应头是服务器返回的响应头用来在浏览器种cookie,一旦被种下,当浏览器访问符合条件的url地址时,会自动带上这个cookie
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-9-17 12:05:33 | 显示全部楼层
弱弱的佳佳 发表于 2020-9-17 09:04
Set-Cookie响应头是服务器返回的响应头用来在浏览器种cookie,一旦被种下,当浏览器访问符合条件的url地址 ...

那要怎么知道什么是符合条件的url呢
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-9-17 12:20:51 | 显示全部楼层    本楼为最佳答案   
set-cookie: server 让浏览器存着,下次访问带上。案例如下:
# 1)urllib:MozillaCookieJar保存
import http.cookiejar
import urllib.request

filename = 'cookies.txt'
cookie = http.cookiejar.MozillaCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)


# --------------------------------------------------------------------------------
# 2)urllib:LWPCookieJar保存和读取
import http.cookiejar
import urllib.request

# 保存
filename = 'cookies.txt'
cookie = http.cookiejar.LWPCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
cookie.save(ignore_discard=True, ignore_expires=True)

# 读取
cookie = http.cookiejar.LWPCookieJar()
cookie.load('cookies.txt', ignore_discard=True, ignore-expires=True)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
r = opener.open('http://www.baidu.com')
print(r.read().decode('utf-i'))


# --------------------------------------------------------------------------------
# 3)requets:保存和读取
# 保存
import requests

r = requests.get('https://www.baidu.com')
with open('cookie.txt', 'w') as f:
    for k, v in r.cookies.items():
        print(k,'=',v)
        f.write(k + '#' + v)



# 从文本读取
import requests
from requests.cookies import RequestsCookieJar

jar = RequestsCookieJar()
with open('cookie.txt','r') as f:
    for item in f.readlines():
        k, v = item.split('#')
        jar.set(k, v)
r = requests.get('https://www.baidu.com', cookies=jar)
print(r.status_code)


# --------------------------------------------------------------------------------
# 4)selenium:保存读取
# 保存为pickle
driver.get(url)
time.sleep(10)
pickle.dump(driver.get_cookies(), open("cookies.pkl", "wb"))

# 保存为文本
with open('cookie.txt', 'w') as f:
    for item in driver.get_cookies():
        data = item['name'] + '#' + item['value']
        f.write(data)


# --------------------------------------------------------------------------------

# 从pickle读取
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
    driver.add_cookie(cookie)
driver.get(url)

# 从文本读取参考requests读取
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-1-18 15:45

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表