我的为什么爬取不到东西。在线求大佬帮助!万分感谢!
本帖最后由 chen1203 于 2021-9-16 05:35 编辑我的为什么爬取不到东西,在线求大佬帮助,万分感谢!
其中日志是这样的:(协议已经被我关了)
2021-09-16 05:25:33 INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 发网址及想提取的数据,如果不方便就只能自己弄 wp231957 发表于 2021-9-16 07:14
发网址及想提取的数据,如果不方便就只能自己弄
https://fishc.com.cn/forum-173-1.html 提取pthon交流板块的各个模块名字 chen1203 发表于 2021-9-16 16:54
https://fishc.com.cn/forum-173-1.html 提取pthon交流板块的各个模块名字
response.xpath("//tbody/text()").extract()
['\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n']看了一下是这种,那是什么鬼
chen1203 发表于 2021-9-16 16:55
response.xpath("//tbody/text()").extract()
['\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\ ...
源代码如下:import scrapy
from tutorial.items import TutorialItem
class QiuSpider(scrapy.Spider):
name="qiu"
allowed_domains=["fishc.com.cn"]
urls=("https://fishc.com.cn/forum-173-1.html",)
def parse(self,response):
item=TutorialItem()
item["content"]=response.xpath("//tbody/text()").extract()
yield item chen1203 发表于 2021-9-16 16:56
源代码如下:import scrapy
from tutorial.items import TutorialItem
class QiuSpider(scrapy.Spider) ...
import scrapy
class TutorialItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
content=scrapy.Field() chen1203 发表于 2021-9-16 16:57
import scrapy
from itemadapter import ItemAdapter
class TutorialPipeline:
def process_item(self, item, spider):
with open("date.text","wb",encoding="utf-8") as f:
f.write(item["content"])
return item chen1203 发表于 2021-9-16 16:55
response.xpath("//tbody/text()").extract()
['\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\n', '\r\ ...
import requests
from lxml import etree
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0',
}
url="https://fishc.com.cn/forum-173-1.html"
html = requests.get(url,headers=headers)
html.encoding="gbk"
obj=etree.HTML(html.text)
data=obj.xpath("//div[@id='subforum_173']/table/tr/td/dl/dt/a/text()")
print(data) chen1203 发表于 2021-9-16 16:56
源代码如下:import scrapy
from tutorial.items import TutorialItem
class QiuSpider(scrapy.Spider) ...
XPATH里没有tbody wp231957 发表于 2021-9-16 17:04
用scrapy项目来完成
页:
[1]