鱼C论坛

 找回密码
 立即注册
查看: 338|回复: 5

请教大神,怎么通过爬虫修改不同地址查看是否有存货

[复制链接]
发表于 2025-11-6 23:47:47 | 显示全部楼层 |阅读模式
20鱼币
请教大神们,怎么通过爬虫  请求不同地址,爬出是否有存货,感谢!!! cb747a20-ef74-4651-b10d-7a496084a0c6.png

小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2025-11-6 23:48:53 | 显示全部楼层
请教大神,怎么通过爬虫修改不同地址查看是否有存货


您好!很高兴为您解答。通过爬虫检查不同地址的存货状态,主要涉及发送HTTP请求并解析返回的HTML内容。下面我将用Python示例来演示一个简单的方法,确保内容易于理解。

基本步骤:
  • 使用 requests 库发送请求到不同地址。
  • 使用 BeautifulSoup 库解析网页内容。
  • 在解析后的内容中搜索表示存货的关键词(如“有货”或“In Stock”)。
  • 循环处理多个地址,并输出结果。


具体实现:
首先,确保安装必要的库(在命令行中运行):
  1. pip install requests beautifulsoup4
复制代码


然后,参考以下代码示例。它会遍历一个地址列表,检查每个页面是否包含存货关键词:

  1. import requests
  2. from bs4 import BeautifulSoup

  3. # 定义要检查的地址列表(请替换为实际网址)
  4. urls = [
  5.     "[url=https://example.com/product1",]https://example.com/product1",[/url]
  6.     "[url=https://example.com/product2"]https://example.com/product2"[/url]
  7. ]

  8. # 遍历每个地址
  9. for url in urls:
  10.     try:
  11.         # 发送GET请求
  12.         response = requests.get(url)
  13.         response.raise_for_status()  # 检查请求是否成功

  14.         # 解析HTML内容
  15.         soup = BeautifulSoup(response.text, 'html.parser')

  16.         # 检查页面文本中是否包含存货关键词(这里以“有货”为例)
  17.         if "有货" in soup.get_text():
  18.             print(f"{url}: 有存货")
  19.         else:
  20.             print(f"{url}: 无存货")
  21.     except Exception as e:
  22.         print(f"{url}: 请求出错 - {e}")
复制代码


代码说明:
- 代码定义了一个URL列表,您可以根据需要修改 urls 中的地址。
- 使用循环逐个请求地址,并通过文本搜索判断存货状态。
- 如果请求失败,会捕获异常并提示。

注意事项:
- 请根据目标网站的实际内容调整关键词(例如,有些网站可能用“库存”或“In Stock”)。
- 确保遵守网站的robots.txt规则,避免频繁请求,以免被封IP。
- 如果网页内容通过JavaScript动态加载,可能需要使用Selenium等工具。

希望这个示例能帮到您!如果有更多细节(如具体网站),我可以进一步优化代码。

以上回复来自 FishC AI 小助理,如未能正确解答您的问题,请继续追问。
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2025-11-7 09:06:12 | 显示全部楼层
目标页面:https://item.jd.com/100051636820.html
需要爬取的数据:页面中“此商品暂时售完”或“有货”等库存状态文本。
变化参数:收货地址(如“天津和平区五大道街道”)。

实现方式:

  • 通过 requests + session 方式访问接口;
  • 调用京东内部的库存接口(不用真的打开网页);
  • 通过修改请求参数 area(地区编码)判断不同地址下是否有货。


用 京东库存接口 查询不同地区的库存状态。
接口示例:

  1. https://c0.3.cn/stock?skuId=100051636820&area=1_2800_2858_0
复制代码

返回结果中 "StockStateName" 字段表示库存状态(如“有货”“无货”“此商品暂时售完”)。

然后按照这个思路写成代码:
  1. import requests, time

  2. def check_stock(sku, area):
  3.     url = f"https://c0.3.cn/stock?skuId={sku}&area={area}"
  4.     headers = {"User-Agent": "Mozilla/5.0"}
  5.     data = requests.get(url, headers=headers, timeout=5).json()
  6.     return data["stock"]["StockStateName"]

  7. sku = "100051636820"
  8. areas = {
  9.     "北京朝阳区": "1_72_2819_0",
  10.     "天津和平区": "2_2820_51975_0",
  11.     "上海浦东新区": "3_60835_0_0",
  12. }

  13. for name, code in areas.items():
  14.     print(f"{name}:{check_stock(sku, code)}")
  15.     time.sleep(1)
复制代码

这样即可快速爬取不同地址的库存状态。
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2025-11-7 13:24:36 | 显示全部楼层
很刑
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2025-11-7 23:19:29 | 显示全部楼层
本帖最后由 13691255250 于 2025-11-7 23:22 编辑
不二如是 发表于 2025-11-7 09:06
目标页面:https://item.jd.com/100051636820.html
需要爬取的数据:页面中“此商品暂时售完”或“有货” ...


大神,想请教一下, 是怎么得到这个地址的
  1. https://c0.3.cn/stock?skuId=100051636820&area=1_2800_2858_0
复制代码


我现在只能做到发现下面这个截图的请求 的响应 有我想要的库存信息,请大神赐教
3f80e3f6-130d-4186-a73d-7f78dcc1e587.png

  1. https://api.m.jd.com/?appid=pc-item-soa&functionId=pc_detailpage_wareBusiness&client=pc&clientVersion=1.0.0&t=1762528374989&body=%7B%22skuId%22%3A100051636820%2C%22cat%22%3A%2227983%2C27987%2C28021%22%2C%22area%22%3A%228_621_3291_61479%22%2C%22shopId%22%3A%221000463356%22%2C%22venderId%22%3A1000463356%2C%22paramJson%22%3A%22%7B%5C%22platform2%5C%22%3A%5C%221%5C%22%2C%5C%22specialAttrStr%5C%22%3A%5C%22p0ppppppppppppppppppppppppp%5C%22%2C%5C%22skuMarkStr%5C%22%3A%5C%2200%5C%22%7D%22%2C%22num%22%3A1%2C%22bbTraffic%22%3A%22%22%2C%22canvasType%22%3A1%2C%22giftServiceIsSelected%22%3A%22%22%2C%22customInfoId%22%3A%220%22%7D&h5st=20251107231259035%3Bwazm363t6qjtwqh5%3Bfb5df%3Btk03w90f31b6e18nmy1hcF0UGcXK6NJvBMRKXgEfPdy8GVbhEg1T7fp-ETbuUbDPAtGP--SzVsdz01Ayi_LpvvAB3R19%3B470c4b970fb95d2c6ae8c2914a1fad134d5acc1e8f41ba0dc441f24b6e8fad6b%3B5.2%3B1762528375035%3BfZRCXZfTzd_WxZfZnZPVsY7ZBh-f1ZPU5cqEt4LFrRqVrVOGncrE-h-T-h6I-hfZXxfTB5_ZzUrJ-hfZXx-Zr9eIwN7V7UeI7IeUAQ_IuZ_VuJuUpdOJvZ_IqZeIrN7ZB5_ZxIdG6YLIqYfZB5hW-B_VoZLUsRbI9c_I_EOI-MLToB_VqRbVpdeIwheT_MrU-h-T-VKJroLJ_YfZB5hW-h_ZB5_ZtN6J-hfZXx-Zup-VzN_ECMbG4IrKsB7ZB5_ZrYfZB5hW-RrNuN8Ln8_HCYfZnZPGyQ7GAY6ZBh-f1Z-VupLHKYfZnZ-IxYfZB5hWkgfZXVeZnZPVwN6J-hfZBh-f1ROVB5_ZxdOE-YfZBhfZXxfT0h-T-ZOVsY7ZBhfZB5hW-h_VVUOEbEMReQ8ZB5_Z0kbIzc7F-hfZBh-f1heZnZfTsY7ZBhfZB5hWxh-T-FOE-YfZBhfZXx-Vuh-T-JOE-YfZBhfZXxfVB5_ZsN6J-hfZBh-f1heZnZfUsY7ZBhfZB5hWs9eZnZvVsY7ZBhfZB5hW-9_WwpfV-h-T-dOE-YfZBhfZXxfVB5_Z2E6ZBhfZB5hWsh-T-VaG-hfZBh-f1heZnZfG-hfZBh-f1heZnZfIqYfZBhfZX1aZnZfIzMbEpM7ZBh-f1taZB5BZpAbD44qG-h-T-ZeF-hfZBh-fmg-T-haF-hfZXx-ZtJeDB1eUrpLHKgvTxpfVwhfMTgvFqkbIz8rM-h-T-dLEuYfZB5xD%3B4366184d25aada9f36357cc3ed44451a7b0adafc96f63b06821a6472779b5a35%3BgRaW989Gy8bE_oLE7w-Gy8rFvM7MtoLI4wrJ1R6G88bG_wPD9k7J1RLHxgKJ&x-api-eid-token=jdd03UQHSEQQF6MSRYLEUCIV6AJ2QYVF3CHLUYX34VMRMSRKHFAGV2SUNNXQVHHAR7NLYB73BB4B4NQ2LHUYJSDDD6NDVUEAAAAM2L3QAIBAAAAAACVGNCOOX2C4YVAX&loginType=3&scval=100051636820&uuid=181111935.1035289842.1762436140.1762436142.1762528330.2
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 昨天 22:36 | 显示全部楼层
不二如是 发表于 2025-11-7 09:06
目标页面:https://item.jd.com/100051636820.html
需要爬取的数据:页面中“此商品暂时售完”或“有货” ...

报错:
  1. Traceback (most recent call last):
  2.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 198, in _new_conn
  3.     sock = connection.create_connection(
  4.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\util\connection.py", line 60, in create_connection
  5.     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  6.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\socket.py", line 977, in getaddrinfo
  7.     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
  8. socket.gaierror: [Errno 11001] getaddrinfo failed

  9. The above exception was the direct cause of the following exception:

  10. Traceback (most recent call last):
  11.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
  12.     response = self._make_request(
  13.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 488, in _make_request
  14.     raise new_e
  15.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 464, in _make_request
  16.     self._validate_conn(conn)
  17.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 1093, in _validate_conn
  18.     conn.connect()
  19.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 753, in connect
  20.     self.sock = sock = self._new_conn()
  21.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connection.py", line 205, in _new_conn
  22.     raise NameResolutionError(self.host, self, e) from e
  23. urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPSConnection object at 0x0000023B91C0F770>: Failed to resolve 'c0.3.cn' ([Errno 11001] getaddrinfo failed)

  24. The above exception was the direct cause of the following exception:

  25. Traceback (most recent call last):
  26.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\adapters.py", line 644, in send
  27.     resp = conn.urlopen(
  28.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\connectionpool.py", line 841, in urlopen
  29.     retries = retries.increment(
  30.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\urllib3\util\retry.py", line 519, in increment
  31.     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
  32. urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='c0.3.cn', port=443): Max retries exceeded with url: /stock?skuId=100051636820&area=1_72_2819_0 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000023B91C0F770>: Failed to resolve 'c0.3.cn' ([Errno 11001] getaddrinfo failed)"))

  33. During handling of the above exception, another exception occurred:

  34. Traceback (most recent call last):
  35.   File "C:/Users/Administrator/AppData/Local/Programs/Python/Python313/3.py", line 17, in <module>
  36.     print(f"{name}:{check_stock(sku, code)}")
  37.   File "C:/Users/Administrator/AppData/Local/Programs/Python/Python313/3.py", line 6, in check_stock
  38.     data = requests.get(url, headers=headers, timeout=5).json()
  39.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\api.py", line 73, in get
  40.     return request("get", url, params=params, **kwargs)
  41.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\api.py", line 59, in request
  42.     return session.request(method=method, url=url, **kwargs)
  43.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\sessions.py", line 589, in request
  44.     resp = self.send(prep, **send_kwargs)
  45.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\sessions.py", line 703, in send
  46.     r = adapter.send(request, **kwargs)
  47.   File "C:\Users\Administrator\AppData\Local\Programs\Python\Python313\Lib\site-packages\requests\adapters.py", line 677, in send
  48.     raise ConnectionError(e, request=request)
  49. requests.exceptions.ConnectionError: HTTPSConnectionPool(host='c0.3.cn', port=443): Max retries exceeded with url: /stock?skuId=100051636820&area=1_72_2819_0 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x0000023B91C0F770>: Failed to resolve 'c0.3.cn' ([Errno 11001] getaddrinfo failed)"))
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-11-29 00:01

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表