|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
网站元素如图,求问如何爬取股票代码000001
- headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/106.0.1370.52'}
- def downHtml(url):
- res=requests.get(url,headers=headers)
- #头等信息
- html=res.text
- return html
- html = downHtml("http://quote.eastmoney.com/stocklist.html#sz")
- soup = BeautifulSoup(html, 'html.parser', from_encoding='gbk')
- div = soup.find('div', attrs={'class': 'listview full'})
- lis = div.find_all('td')
- names = []
- for li in lis:
- a = li.find('a')
- name = a.text
- names.append(name)
复制代码
本帖最后由 cflying 于 2022-10-31 23:15 编辑
如果不想麻烦的话,通过浏览器模拟来爬出来,然后pandas也基本可以达到这个效果
这网站前几年都直接是静态表格,改成动态应该也是1年内的事情
- from playwright.sync_api import sync_playwright
- import pandas as pd
- with sync_playwright() as p:
- browser = p.chromium.launch()
- context = browser.new_context()
- page = context.new_page()
- page.goto('http://quote.eastmoney.com/center/gridlist.html')
- tables=pd.read_html(page.content())
- print(tables[0])
复制代码
|
|