Python3.4 安装 Scrapy 并运行成功

SixPy · 发表于 2016-7-18 14:26:38

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

Python3.4 安装 Scrapy 并运行成功

正在整理~~~稍候~

SixPy · 发表于 2016-7-18 16:47:41

本帖最后由 SixPy 于 2016-7-18 16:49 编辑

到这里 http://www.lfd.uci.edu/~gohlke/pythonlibs/ 下载 scrapy 安装包
Scrapy-1.1.1-py2.py3-none-any.whl

按 Ctrl+F 查找，就看到了。

按 win+R，输入 cmd ，打开小黑窗。
再输入 pip install 和 Scrapy 安装包的全路径

回车后，会自动下载相关的包，并自动安装。

SixPy · 发表于 2016-7-18 16:55:21

安装 PyWin32
原本在楼上那个网站，找到 PyWin32，并下载与你的操作系统相匹配的安装包。
用 pip install 安装。

SixPy · 发表于 2016-7-18 17:05:26

本帖最后由 SixPy 于 2016-8-27 10:34 编辑

到这里
~~https://github.com/twisted/twisted/tree/trunk/twisted/internet~~

下载2个文件：
_pollingfile.py
_win32stdio.py

复制到
python目录的 Lib\site-packages\twisted\internet

更新：
有鱼油可能不知道如何下该网站的py文件。
~~https://raw.githubusercontent.com/twisted/twisted/trunk/twisted/internet/_pollingfile.py~~
~~https://raw.githubusercontent.com/twisted/twisted/trunk/twisted/internet/_win32stdio.py~~

直接在浏览器里打开2个文件的原地址就下载到文件的本体了。

登录/注册后可看大图

2016-08-22 更新
模块作者更新了文件，文件路径改变了
https://github.com/twisted/twisted/tree/trunk/src/twisted/internet
https://raw.githubusercontent.com/twisted/twisted/trunk/src/twisted/internet/_pollingfile.py
https://raw.githubusercontent.com/twisted/twisted/trunk/src/twisted/internet/_win32stdio.py

SixPy · 发表于 2016-7-18 17:14:00

本帖最后由 SixPy 于 2016-7-18 17:15 编辑

还没完呢~

打开
python目录的 lib\site-packages\twisted\python\compat.py

用编辑器打开 compat.py
定位到 68 行。
把漏写的变量加上
修改前：

if [x for x in addr if x not in string.hexdigits + ':.']:

复制代码

修改后：

x = [x for x in addr if x not in string.hexdigits + ':.']
if x:

复制代码

SixPy · 发表于 2016-7-18 17:20:40

到此，安装和修复bug，告一段落了~

下面是测试运行~

SixPy · 发表于 2016-7-18 17:39:56

本帖最后由 SixPy 于 2016-7-18 17:50 编辑

新建一个文件夹来做测试
我的是：
D:\Python34\work\scrapy_test

运行 cmd ，在小黑窗里改变目录到那个新建的文件夹

D：
cd D:\Python34\work\scrapy_test

输入：
scrapy startproject tutorial

然后scrapy 会新建一个叫 tutorial 的项目。
里面的文件夹结构都是自动建好的~

SixPy · 发表于 2016-7-18 17:55:56

把 D:\Python34\work\scrapy_test\tutorial\tutorial 文件夹里的
items.py 的代码改为如下内容，并保存。

# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class DmozItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()

复制代码

SixPy · 发表于 2016-7-18 17:58:43

新建一个 Spider.py
内容如下：

import scrapy
from tutorial.items import DmozItem
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
for sel in response.xpath('//ul/li'):
item = DmozItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
item['desc'] = sel.xpath('text()').extract()
yield item

复制代码

保存到这文件夹里
D:\Python34\work\scrapy_test\tutorial\tutorial\spiders

SixPy · 发表于 2016-7-18 18:05:09

好了，准备妥当
回到 cmd小黑窗
输入：
scrapy crawl dmoz -o item.json

回车，进行测试。

============
最后，在
D:\Python34\work\scrapy_test\tutorial
这个文件夹里找到 scrapy 生成的 item.json
用 notepad 打开，里面是 JSon结构的txt。
内容大致如下：

[
[
{"desc": [" ", " "], "title": [" About "], "link": ["/docs/en/about.html"]},
{"desc": [" ", " "], "title": [" Become an Editor "], "link": ["/docs/en/help/become.html"]},
{"desc": [" ", " "], "title": [" Suggest a Site "], "link": ["/docs/en/add.html"]},
{"desc": [" ", " "], "title": [" Help "], "link": ["/docs/en/help/helpmain.html"]},
{"desc": [" ", " "], "title": [" Login "], "link": ["/editors/"]},
{"desc": [" ", " Share via Facebook "], "title": [], "link": []},
{"desc": [" ", " Share via Twitter "], "title": [], "link": []},
{"desc": [" ", " Share via LinkedIn "], "title": [], "link": []},
{"desc": [" ", " Share via e-Mail "], "title": [], "link": []},
{"desc": [" ", " "], "title": [], "link": []},
{"desc": [" ", " "], "title": [], "link": []},
{"desc": [" ", " "], "title": [" About "], "link": ["/docs/en/about.html"]},
{"desc": [" ", " "], "title": [" Become an Editor "], "link": ["/docs/en/help/become.html"]},
{"desc": [" ", " "], "title": [" Suggest a Site "], "link": ["/docs/en/add.html"]},
{"desc": [" ", " "], "title": [" Help "], "link": ["/docs/en/help/helpmain.html"]},
{"desc": [" ", " "], "title": [" Login "], "link": ["/editors/"]},
{"desc": [" ", " Share via Facebook "], "title": [], "link": []},
{"desc": [" ", " Share via Twitter "], "title": [], "link": []},
{"desc": [" ", " Share via LinkedIn "], "title": [], "link": []},
{"desc": [" ", " Share via e-Mail "], "title": [], "link": []},
{"desc": [" ", " "], "title": [], "link": []},
{"desc": [" ", " "], "title": [], "link": []}
]

复制代码

SixPy · 发表于 2016-7-18 18:06:40

真TM累~
总算完了~

scrapy 有什么用呢？

过河卒子Rover · 发表于 2016-7-18 18:39:31

感谢分享，下班回去试试！

305910416 · 发表于 2016-7-21 12:40:22

SixPy 发表于 2016-7-18 18:06
真TM累~
总算完了~

爬虫呀~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

305910416 · 发表于 2016-7-21 12:42:15

安装个Scrapy 这么复杂的

mozzielx · 发表于 2016-7-24 19:51:30

支持

baiyixk · 发表于 2016-8-5 15:17:33

表示还是没成功

SixPy · 发表于 2016-8-5 16:43:46

baiyixk 发表于 2016-8-5 15:17
表示还是没成功

你是下载了网页
应该下py文件的本体
点 raw 按钮，就是文件的本体。

SixPy · 发表于 2016-8-5 16:52:29

baiyixk 发表于 2016-8-5 15:17
表示还是没成功

http://bbs.fishc.com/forum.php?mod=redirect&goto=findpost&ptid=73916&pid=2612102

看4楼，更新了下载方法~

ssg2006 · 发表于 2016-8-19 13:51:24

第一步安装就不成功，截图如附件

SixPy · 发表于 2016-8-19 13:54:33

ssg2006 发表于 2016-8-19 13:51
第一步安装就不成功，截图如附件

http://bbs.fishc.com/forum.php?mod=redirect&goto=findpost&ptid=74931&pid=2636168

账号		自动登录	找回密码
密码			立即注册

[技术交流] Python3.4 安装 Scrapy 并运行成功

马上注册，结交更多好友，享用更多功能^_^

评分

本帖被以下淘专辑推荐: