鱼C论坛

 找回密码
 立即注册
查看: 1731|回复: 2

爬取淘宝数据遇到问题

[复制链接]
发表于 2021-4-16 23:13:37 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
这个是用来保存数据
import requests,bs4,re,json

import requests

headers = {
    'authority': 's.taobao.com',
    'cache-control': 'max-age=0',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'zh-CN,zh;q=0.9',
    'cookie': 'thw=cn; x=e%3D1%26p%3D*%26s%3D0%26c%3D0%26f%3D0%26g%3D0%26t%3D0; ali_ab=223.74.49.30.1566551985403.0; hng=CN%7Czh-CN%7CCNY%7C156; cna=LfydFoSmaEkCAd9KMUJlVvaW; tracknick=t_1490707425300_0517; _cc_=WqG3DMC9EA%3D%3D; enc=UpJ65BctJWnfHhZN%2BexMnnQXnYaSSCY%2FyTwknOShA39CkIUkPMLCc%2BBLKyW2XfKABKeep9l264%2BzCn1JTP9zKA%3D%3D; miid=2038313907639986985; sgcookie=E100G%2FmiaHBaWGqV8xr1D5U2chudrikwysCpGo%2B8GgVz37lHWQeYLAq1Rl9Sy%2BiAmo88%2F4ZdToXFocorWqtIFIgeMw%3D%3D; UM_distinctid=17877b99fb22ea-0a9c81556e3efd-79391a30-1fa400-17877b99fb36ec; mt=ci%3D-1_1; _m_h5_tk=ab1f7e86eb1d6b0e0d829fbb6d8f6828_1618331300394; _m_h5_tk_enc=8877efea7066111e65c8f7250501cfab; alitrackid=www.taobao.com; lastalitrackid=www.taobao.com; __guid=154677242.3171584363069999600.1618322627981.9897; xlly_s=1; CNZZDATA1277450732=1378885484-1568777263-https%253A%252F%252Fwww.taobao.com%252F%7C1618322710; JSESSIONID=75BC51EB7A81EBE48D35F20AA51B953F; monitor_count=7; tfstk=cxghB7OA9WD7HMqiceaIOw6GY9xOZB7zn4us74jJNVVxA4uNijSN0H1psJqa2g1..; l=eBLOFykgjzST7Y7SBOfZnurza779IIRAguPzaNbMiOCP995p5SKCW6aW3q89CnGVh6VBR3uPfIu3BeYB4QAonxv92j-la_Hmn; isg=BA0NWR3kUZmccfWg-JtqdwAmHCiH6kG8gIEUjE-SS6QRRi34Fzg8jHzUsNoghll0',
    'referer': 'https://s.taobao.com/',
    "http":'106.56.102.107:8888'}


a=input('输入关键词')

params = (
    ('q', a),
    ('sort', 'sale-desc'))

response = requests.get('https://s.taobao.com/search', headers=headers, params=params)
with open('淘宝数据临时.txt','w',encoding='utf-8') as f:
    f.write(response.text)
#NB. Original query string below. It seems impossible to parse and
#reproduce query strings 100% accurately so the one below is given
#in case the reproduced version is not "correct".
# response = requests.get('https://s.taobao.com/search?q=%E9%A3%9F%E8%99%AB%E8%8D%89&sort=sale-desc', headers=headers)

这个用来爬取数据
import requests
import bs4
import re,json
with open('淘宝数据临时.txt','r',encoding='utf-8') as f:
    biaoti=re.findall(r'"raw_title":"(.*?)"',f.read())#标题
    jiage=re.findall(r'"view_price":"(.*?)"',f.read())#价格
    dizhi=re.findall(r'"item_loc":"(.*?)"',f.read())#地址
    xiaoliang=re.findall(r'"view_sales":"(.*?)"',f.read())#销量
    dianpu=re.findall(r'"nick":"(.*?)"',f.read())#店铺名


 
但是最后只有biaoti里面有数据   jiage,dizhi等都没有数据
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2021-4-17 14:46:08 From FishC Mobile | 显示全部楼层
用Selenium吧
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-4-17 15:45:43 | 显示全部楼层
有这两种可能:
1.淘宝很多东西都被加密了,最好不要爬
2.淘宝现在要登录才能搜索
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-1-16 02:34

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表