鱼C论坛

 找回密码
 立即注册
查看: 1546|回复: 2

爬取淘宝数据遇到问题

[复制链接]
发表于 2021-4-16 23:13:37 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
这个是用来保存数据
  1. import requests,bs4,re,json

  2. import requests

  3. headers = {
  4.     'authority': 's.taobao.com',
  5.     'cache-control': 'max-age=0',
  6.     'upgrade-insecure-requests': '1',
  7.     'user-agent': 'Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14',
  8.     'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
  9.     'sec-fetch-site': 'same-origin',
  10.     'sec-fetch-mode': 'navigate',
  11.     'sec-fetch-user': '?1',
  12.     'sec-fetch-dest': 'document',
  13.     'accept-language': 'zh-CN,zh;q=0.9',
  14.     'cookie': 'thw=cn; x=e%3D1%26p%3D*%26s%3D0%26c%3D0%26f%3D0%26g%3D0%26t%3D0; ali_ab=223.74.49.30.1566551985403.0; hng=CN%7Czh-CN%7CCNY%7C156; cna=LfydFoSmaEkCAd9KMUJlVvaW; tracknick=t_1490707425300_0517; _cc_=WqG3DMC9EA%3D%3D; enc=UpJ65BctJWnfHhZN%2BexMnnQXnYaSSCY%2FyTwknOShA39CkIUkPMLCc%2BBLKyW2XfKABKeep9l264%2BzCn1JTP9zKA%3D%3D; miid=2038313907639986985; sgcookie=E100G%2FmiaHBaWGqV8xr1D5U2chudrikwysCpGo%2B8GgVz37lHWQeYLAq1Rl9Sy%2BiAmo88%2F4ZdToXFocorWqtIFIgeMw%3D%3D; UM_distinctid=17877b99fb22ea-0a9c81556e3efd-79391a30-1fa400-17877b99fb36ec; mt=ci%3D-1_1; _m_h5_tk=ab1f7e86eb1d6b0e0d829fbb6d8f6828_1618331300394; _m_h5_tk_enc=8877efea7066111e65c8f7250501cfab; alitrackid=www.taobao.com; lastalitrackid=www.taobao.com; __guid=154677242.3171584363069999600.1618322627981.9897; xlly_s=1; CNZZDATA1277450732=1378885484-1568777263-https%253A%252F%252Fwww.taobao.com%252F%7C1618322710; JSESSIONID=75BC51EB7A81EBE48D35F20AA51B953F; monitor_count=7; tfstk=cxghB7OA9WD7HMqiceaIOw6GY9xOZB7zn4us74jJNVVxA4uNijSN0H1psJqa2g1..; l=eBLOFykgjzST7Y7SBOfZnurza779IIRAguPzaNbMiOCP995p5SKCW6aW3q89CnGVh6VBR3uPfIu3BeYB4QAonxv92j-la_Hmn; isg=BA0NWR3kUZmccfWg-JtqdwAmHCiH6kG8gIEUjE-SS6QRRi34Fzg8jHzUsNoghll0',
  15.     'referer': 'https://s.taobao.com/',
  16.     "http":'106.56.102.107:8888'}


  17. a=input('输入关键词')

  18. params = (
  19.     ('q', a),
  20.     ('sort', 'sale-desc'))

  21. response = requests.get('https://s.taobao.com/search', headers=headers, params=params)
  22. with open('淘宝数据临时.txt','w',encoding='utf-8') as f:
  23.     f.write(response.text)
  24. #NB. Original query string below. It seems impossible to parse and
  25. #reproduce query strings 100% accurately so the one below is given
  26. #in case the reproduced version is not "correct".
  27. # response = requests.get('https://s.taobao.com/search?q=%E9%A3%9F%E8%99%AB%E8%8D%89&sort=sale-desc', headers=headers)

复制代码


这个用来爬取数据
  1. import requests
  2. import bs4
  3. import re,json
  4. with open('淘宝数据临时.txt','r',encoding='utf-8') as f:
  5.     biaoti=re.findall(r'"raw_title":"(.*?)"',f.read())#标题
  6.     jiage=re.findall(r'"view_price":"(.*?)"',f.read())#价格
  7.     dizhi=re.findall(r'"item_loc":"(.*?)"',f.read())#地址
  8.     xiaoliang=re.findall(r'"view_sales":"(.*?)"',f.read())#销量
  9.     dianpu=re.findall(r'"nick":"(.*?)"',f.read())#店铺名



复制代码

但是最后只有biaoti里面有数据   jiage,dizhi等都没有数据
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2021-4-17 14:46:08 From FishC Mobile | 显示全部楼层
用Selenium吧
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2021-4-17 15:45:43 | 显示全部楼层
有这两种可能:
1.淘宝很多东西都被加密了,最好不要爬
2.淘宝现在要登录才能搜索
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-5-13 03:31

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表