鱼C论坛

 找回密码
 立即注册
查看: 2820|回复: 1

爬取数据导入数据库,每个字符串中的字符占一个值

[复制链接]
发表于 2015-12-23 18:25:48 | 显示全部楼层 |阅读模式
3鱼币


import urllib.request
import requests
import pymssql


url = "http://www.tj2zy.com/Class/sxbzxrmd/index.htm"

r = urllib.request.urlopen(url).info()

print(r)


def open_url(url):
    req = urllib.request.Request(url)
    response = urllib.request.urlopen(url)
    html = response.read()
    return html


def get_maxpage(url):
    html = open_url(url).decode('utf-8')
    a = html.find('.htm">末页')
    b = html.find('a href=', a-40,a)
    pages=int(html[b+14:a])
    return pages
def get_url(url):
    html = open_url(url).decode('utf-8')
    url_address = []
   
    a = html.find('a href= /News')
   
    while a != -1:
        b = html.find("target",a,a+50)
        if b != -1:
            url_address.append(html[a+8:b-5])
        else:
            b = a + 8
        a = html.find('a href= /News',b)
    return url_address



def download_url():
    url = "http://www.tj2zy.com/Class/sxbzxrmd/index.htm"
    front = "http://www.tj2zy.com"
    back = ".htm"
    page_num = get_url(url)
    pages = int(get_maxpage(url))
   
    conn = pymssql.connect(host='192.168.0.123', user='sa', password='glad2015', database='test',charset="utf8")
    curson = conn.cursor()
    conn.commit()
    for page in page_num:
        page_url = front + page + back
        curson.executemany('insert into test1(nr_url) values(%s)', (page_url))
        print(page_url)
    curson.close()
    conn.close()
if __name__ == '__main__':
    download_url()

小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2016-8-8 11:11:04 | 显示全部楼层
看看看~~~
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2026-2-18 21:28

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表