|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 幽梦三影 于 2018-7-5 18:05 编辑
- import random
- import urllib.request
- import re
- from bs4 import BeautifulSoup
- ip = [
- "221.228.17.172:8181",
- "118.31.220.3:8080",
- "61.135.217.7:80"
- ]
- user_pool = [
- "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0",
- "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Maxthon/3.0)",
- "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ;  QIHU 360EE)"
- ]
- current_ip = random.choice(ip)
- current_user = random.choice(user_pool)
- headers = ("User-Agent",current_user)
- opener = urllib.request.build_opener()
- opener.addheaders = [headers]
- urllib.request.install_opener(opener)
- p = urllib.request.ProxyHandler({"http":current_ip})
- opener = urllib.request.build_opener(p,urllib.request.HTTPHandler)
- urllib.request.install_opener(opener)
- url = "http://www.taobao.com"
- html = urllib.request.urlopen(url).read()
- html = BeautifulSoup(html)
- link = html.find_all("a",href = True)
- #print(html)
- for i in link:
- print(i.get("href"))
复制代码 用代理ip爬取淘宝连接却都是有道,还有小甲鱼在视频里说的爬取妹子图会出现其他的图到底是怎么回事 |
-
|