13691255250 发表于 2025-11-6 23:47:47

请教大神,怎么通过爬虫修改不同地址查看是否有存货

请教大神们,怎么通过爬虫请求不同地址,爬出是否有存货,感谢!!!

FishC 发表于 2025-11-6 23:48:53

请教大神,怎么通过爬虫修改不同地址查看是否有存货

您好!很高兴为您解答。通过爬虫检查不同地址的存货状态,主要涉及发送HTTP请求并解析返回的HTML内容。下面我将用Python示例来演示一个简单的方法,确保内容易于理解。

基本步骤:

[*]使用 requests 库发送请求到不同地址。
[*]使用 BeautifulSoup 库解析网页内容。
[*]在解析后的内容中搜索表示存货的关键词(如“有货”或“In Stock”)。
[*]循环处理多个地址,并输出结果。


具体实现:
首先,确保安装必要的库(在命令行中运行):
pip install requests beautifulsoup4

然后,参考以下代码示例。它会遍历一个地址列表,检查每个页面是否包含存货关键词:

import requests
from bs4 import BeautifulSoup

# 定义要检查的地址列表(请替换为实际网址)
urls = [
    "https://example.com/product1",
    "https://example.com/product2"
]

# 遍历每个地址
for url in urls:
    try:
      # 发送GET请求
      response = requests.get(url)
      response.raise_for_status()# 检查请求是否成功

      # 解析HTML内容
      soup = BeautifulSoup(response.text, 'html.parser')

      # 检查页面文本中是否包含存货关键词(这里以“有货”为例)
      if "有货" in soup.get_text():
            print(f"{url}: 有存货")
      else:
            print(f"{url}: 无存货")
    except Exception as e:
      print(f"{url}: 请求出错 - {e}")


代码说明:
- 代码定义了一个URL列表,您可以根据需要修改 urls 中的地址。
- 使用循环逐个请求地址,并通过文本搜索判断存货状态。
- 如果请求失败,会捕获异常并提示。

注意事项:
- 请根据目标网站的实际内容调整关键词(例如,有些网站可能用“库存”或“In Stock”)。
- 确保遵守网站的robots.txt规则,避免频繁请求,以免被封IP。
- 如果网页内容通过JavaScript动态加载,可能需要使用Selenium等工具。

希望这个示例能帮到您!如果有更多细节(如具体网站),我可以进一步优化代码。

以上回复来自 FishC AI 小助理,如未能正确解答您的问题,请继续追问。

不二如是 发表于 2025-11-7 09:06:12

目标页面:https://item.jd.com/100051636820.html
需要爬取的数据:页面中“此商品暂时售完”或“有货”等库存状态文本。
变化参数:收货地址(如“天津和平区五大道街道”)。

实现方式:


[*]通过 requests + session 方式访问接口;
[*]调用京东内部的库存接口(不用真的打开网页);
[*]通过修改请求参数 area(地区编码)判断不同地址下是否有货。


用 京东库存接口 查询不同地区的库存状态。
接口示例:

https://c0.3.cn/stock?skuId=100051636820&area=1_2800_2858_0
返回结果中 "StockStateName" 字段表示库存状态(如“有货”“无货”“此商品暂时售完”)。

然后按照这个思路写成代码:
import requests, time

def check_stock(sku, area):
    url = f"https://c0.3.cn/stock?skuId={sku}&area={area}"
    headers = {"User-Agent": "Mozilla/5.0"}
    data = requests.get(url, headers=headers, timeout=5).json()
    return data["stock"]["StockStateName"]

sku = "100051636820"
areas = {
    "北京朝阳区": "1_72_2819_0",
    "天津和平区": "2_2820_51975_0",
    "上海浦东新区": "3_60835_0_0",
}

for name, code in areas.items():
    print(f"{name}:{check_stock(sku, code)}")
    time.sleep(1)
这样即可快速爬取不同地址的库存状态。

快速收敛 发表于 2025-11-7 13:24:36

很刑{:10_256:}

13691255250 发表于 2025-11-7 23:19:29

本帖最后由 13691255250 于 2025-11-7 23:22 编辑

不二如是 发表于 2025-11-7 09:06
目标页面:https://item.jd.com/100051636820.html
需要爬取的数据:页面中“此商品暂时售完”或“有货” ...

大神,想请教一下, 是怎么得到这个地址的https://c0.3.cn/stock?skuId=100051636820&area=1_2800_2858_0

我现在只能做到发现下面这个截图的请求 的响应 有我想要的库存信息,请大神赐教


https://api.m.jd.com/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1762528374989&body=%7B%22skuId%22%3A100051636820%2C%22cat%22%3A%2227983%2C27987%2C28021%22%2C%22area%22%3A%228_621_3291_61479%22%2C%22shopId%22%3A%221000463356%22%2C%22venderId%22%3A1000463356%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppppppppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%2C%22canvasType%22%3A1%2C%22giftServiceIsSelected%22%3A%22%22%2C%22customInfoId%22%3A%220%22%7D&h5st=20251107231259035%3Bwazm363t6qjtwqh5%3Bfb5df%3Btk03w90f31b6e18nmy1hcF0UGcXK6NJvBMRKXgEfPdy8GVbhEg1T7fp-ETbuUbDPAtGP--SzVsdz01Ayi_LpvvAB3R19%3B470c4b970fb95d2c6ae8c2914a1fad134d5acc1e8f41ba0dc441f24b6e8fad6b%3B5.2%3B1762528375035%3BfZRCXZfTzd_WxZfZnZPVsY7ZBh-f1ZPU5cqEt4LFrRqVrVOGncrE-h-T-h6I-hfZXxfTB5_ZzUrJ-hfZXx-Zr9eIwN7V7UeI7IeUAQ_IuZ_VuJuUpdOJvZ_IqZeIrN7ZB5_ZxIdG6YLIqYfZB5hW-B_VoZLUsRbI9c_I_EOI-MLToB_VqRbVpdeIwheT_MrU-h-T-VKJroLJ_YfZB5hW-h_ZB5_ZtN6J-hfZXx-Zup-VzN_ECMbG4IrKsB7ZB5_ZrYfZB5hW-RrNuN8Ln8_HCYfZnZPGyQ7GAY6ZBh-f1Z-VupLHKYfZnZ-IxYfZB5hWkgfZXVeZnZPVwN6J-hfZBh-f1ROVB5_ZxdOE-YfZBhfZXxfT0h-T-ZOVsY7ZBhfZB5hW-h_VVUOEbEMReQ8ZB5_Z0kbIzc7F-hfZBh-f1heZnZfTsY7ZBhfZB5hWxh-T-FOE-YfZBhfZXx-Vuh-T-JOE-YfZBhfZXxfVB5_ZsN6J-hfZBh-f1heZnZfUsY7ZBhfZB5hWs9eZnZvVsY7ZBhfZB5hW-9_WwpfV-h-T-dOE-YfZBhfZXxfVB5_Z2E6ZBhfZB5hWsh-T-VaG-hfZBh-f1heZnZfG-hfZBh-f1heZnZfIqYfZBhfZX1aZnZfIzMbEpM7ZBh-f1taZB5BZpAbD44qG-h-T-ZeF-hfZBh-fmg-T-haF-hfZXx-ZtJeDB1eUrpLHKgvTxpfVwhfMTgvFqkbIz8rM-h-T-dLEuYfZB5xD%3B4366184d25aada9f36357cc3ed44451a7b0adafc96f63b06821a6472779b5a35%3BgRaW989Gy8bE_oLE7w-Gy8rFvM7MtoLI4wrJ1R6G88bG_wPD9k7J1RLHxgKJ&x-api-eid-token=jdd03UQHSEQQF6MSRYLEUCIV6AJ2QYVF3CHLUYX34VMRMSRKHFAGV2SUNNXQVHHAR7NLYB73BB4B4NQ2LHUYJSDDD6NDVUEAAAAM2L3QAIBAAAAAACVGNCOOX2C4YVAX&loginType=3&scval=100051636820&uuid=181111935.1035289842.1762436140.1762436142.1762528330.2

wxw2015 发表于 2025-11-28 22:36:47

不二如是 发表于 2025-11-7 09:06
目标页面:https://item.jd.com/100051636820.html
需要爬取的数据:页面中“此商品暂时售完”或“有货” ...

报错:
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 198, in _new_conn
    sock = connection.create_connection(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\util\connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\socket.py", line 977, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: getaddrinfo failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
    response = self._make_request(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 488, in _make_request
    raise new_e
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 464, in _make_request
    self._validate_conn(conn)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 1093, in _validate_conn
    conn.connect()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 753, in connect
    self.sock = sock = self._new_conn()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 205, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x0000023B91C0F770>: Failed to resolve 'c0.3.cn' ( getaddrinfo failed)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\adapters.py", line 644, in send
    resp = conn.urlopen(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 841, in urlopen
    retries = retries.increment(
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\util\retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason# type: ignore
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='c0.3.cn', port=443): Max retries exceeded with url: /stock?skuId=100051636820&area=1_72_2819_0 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000023B91C0F770>: Failed to resolve 'c0.3.cn' ( getaddrinfo failed)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/Administrator/AppData/Local/Programs/Python/Python313/3.py", line 17, in <module>
    print(f"{name}:{check_stock(sku, code)}")
File "C:/Users/Administrator/AppData/Local/Programs/Python/Python313/3.py", line 6, in check_stock
    data = requests.get(url, headers=headers, timeout=5).json()
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\adapters.py", line 677, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='c0.3.cn', port=443): Max retries exceeded with url: /stock?skuId=100051636820&area=1_72_2819_0 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000023B91C0F770>: Failed to resolve 'c0.3.cn' ( getaddrinfo failed)"))
页: [1]
查看完整版本: 请教大神,怎么通过爬虫修改不同地址查看是否有存货