|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 黑色光亮 于 2019-9-18 16:42 编辑
- import requests
- import bs4
- def open_url(url):
- headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}
- res = requests.get(url,headers = headers)
- return res
- def find_data(res):
- soup = bs4.BeautifulSoup(res.text,"html.parser")
- content = soup.find(id = "Cnt-Main-Article-QQ")
- target = content.find_all("p",style = "TEXT-INDENT:2em")
-
- for each in target:
- print(each.text)
- def main():
- url = "http://news.house.qq.com/a/20170702/003985.htm"
- res = open_url(url)
- find_data(res)
- if __name__ == "__main__":
- main()
复制代码
import bs4
File "C:\Users\hp\AppData\Local\Programs\Python\Python37\lib\site-packages\bs4\__init__.py", line 30, in <module>
from .builder import builder_registry, ParserRejectedMarkup
File "C:\Users\hp\AppData\Local\Programs\Python\Python37\lib\site-packages\bs4\builder\__init__.py", line 321
'''from . import _html5lib
register_treebuilders_from(_html5lib)
except ImportError:
# They don't have html5lib installed.
pass
try:
from . import _lxml
register_treebuilders_from(_lxml)
except ImportError:
# They don't have lxml installed.
pass
^
SyntaxError: EOF while scanning triple-quoted string literal
IDLE报的错误如上,百度了也没有搜到解决办法,求助呀
本帖最后由 yuweb 于 2019-9-18 17:43 编辑
如果还不行试试把
soup = bs4.BeautifulSoup(res.text,"html.parser")
改成
soup = bs4.BeautifulSoup(res.text,"lxml")
还有你target里面不匹配,少了个空格,应该是 (下边的这个2em之前)
target = content.find_all("p",style = "TEXT-INDENT: 2em")
它的源代码有空格的
<P style="TEXT-INDENT: 2em" class=text><STRONG>最新房价工资排名出炉!</STRONG></P>
|
|