17623095765 发表于 2021-4-16 23:13:37

爬取淘宝数据遇到问题

这个是用来保存数据
import requests,bs4,re,json

import requests

headers = {
    'authority': 's.taobao.com',
    'cache-control': 'max-age=0',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'zh-CN,zh;q=0.9',
    'cookie': 'thw=cn; x=e%3D1%26p%3D*%26s%3D0%26c%3D0%26f%3D0%26g%3D0%26t%3D0; ali_ab=223.74.49.30.1566551985403.0; hng=CN%7Czh-CN%7CCNY%7C156; cna=LfydFoSmaEkCAd9KMUJlVvaW; tracknick=t_1490707425300_0517; _cc_=WqG3DMC9EA%3D%3D; enc=UpJ65BctJWnfHhZN%2BexMnnQXnYaSSCY%2FyTwknOShA39CkIUkPMLCc%2BBLKyW2XfKABKeep9l264%2BzCn1JTP9zKA%3D%3D; miid=2038313907639986985; sgcookie=E100G%2FmiaHBaWGqV8xr1D5U2chudrikwysCpGo%2B8GgVz37lHWQeYLAq1Rl9Sy%2BiAmo88%2F4ZdToXFocorWqtIFIgeMw%3D%3D; UM_distinctid=17877b99fb22ea-0a9c81556e3efd-79391a30-1fa400-17877b99fb36ec; mt=ci%3D-1_1; _m_h5_tk=ab1f7e86eb1d6b0e0d829fbb6d8f6828_1618331300394; _m_h5_tk_enc=8877efea7066111e65c8f7250501cfab; alitrackid=www.taobao.com; lastalitrackid=www.taobao.com; __guid=154677242.3171584363069999600.1618322627981.9897; xlly_s=1; CNZZDATA1277450732=1378885484-1568777263-https%253A%252F%252Fwww.taobao.com%252F%7C1618322710; JSESSIONID=75BC51EB7A81EBE48D35F20AA51B953F; monitor_count=7; tfstk=cxghB7OA9WD7HMqiceaIOw6GY9xOZB7zn4us74jJNVVxA4uNijSN0H1psJqa2g1..; l=eBLOFykgjzST7Y7SBOfZnurza779IIRAguPzaNbMiOCP995p5SKCW6aW3q89CnGVh6VBR3uPfIu3BeYB4QAonxv92j-la_Hmn; isg=BA0NWR3kUZmccfWg-JtqdwAmHCiH6kG8gIEUjE-SS6QRRi34Fzg8jHzUsNoghll0',
    'referer': 'https://s.taobao.com/',
    "http":'106.56.102.107:8888'}


a=input('输入关键词')

params = (
    ('q', a),
    ('sort', 'sale-desc'))

response = requests.get('https://s.taobao.com/search', headers=headers, params=params)
with open('淘宝数据临时.txt','w',encoding='utf-8') as f:
    f.write(response.text)
#NB. Original query string below. It seems impossible to parse and
#reproduce query strings 100% accurately so the one below is given
#in case the reproduced version is not "correct".
# response = requests.get('https://s.taobao.com/search?q=%E9%A3%9F%E8%99%AB%E8%8D%89&sort=sale-desc', headers=headers)



这个用来爬取数据
import requests
import bs4
import re,json
with open('淘宝数据临时.txt','r',encoding='utf-8') as f:
    biaoti=re.findall(r'"raw_title":"(.*?)"',f.read())#标题
    jiage=re.findall(r'"view_price":"(.*?)"',f.read())#价格
    dizhi=re.findall(r'"item_loc":"(.*?)"',f.read())#地址
    xiaoliang=re.findall(r'"view_sales":"(.*?)"',f.read())#销量
    dianpu=re.findall(r'"nick":"(.*?)"',f.read())#店铺名




但是最后只有biaoti里面有数据   jiage,dizhi等都没有数据

wp231957 发表于 2021-4-17 14:46:08

用Selenium吧

xiaosi4081 发表于 2021-4-17 15:45:43

有这两种可能:
1.淘宝很多东西都被加密了,最好不要爬
2.淘宝现在要登录才能搜索
页: [1]
查看完整版本: 爬取淘宝数据遇到问题