课后55讲:隐藏 这一讲有作业给到的答案直接无法运行
本帖最后由 chuangyuemx 于 2021-7-24 20:46 编辑[课后作业] 第055讲:论一只爬虫的自我修养3:隐藏 | 课后测试题及答案
这题目前的答案已无法运行,能否更新下答案并给些注释呢?
1. 直接打印词条名和链接不算什么真本事儿,这题要求你的爬虫允许用户输入搜索的关键词。
然后爬虫进入每一个词条,然后检测该词条是否具有副标题(比如搜索“猪八戒”,副标题就是“(中国神话小说《西游记》的角色)”),如果有,请将副标题一并打印出来:
能不能发一下你的代码 "论一只爬虫的自我修养" ?nb,哈哈哈,自己慢慢修吧~
看下这里吧,大致是这么个意思:
[已解决] 055课 爬百度百科“网络爬虫”的词条 问题
https://fishc.com.cn/thread-169631-1-1.html
(出处: 鱼C论坛)
Twilight6 发表于 2021-7-21 09:41
看下这里吧,大致是这么个意思:
[已解决] 055课 爬百度百科“网络爬虫”的词条 问题
这个解决的是动动手第0题的问题哈,这题我会。我的问题是动动手的第1题哈,我想了半天 搜了半天也没整出来,所以来求助。 阿奇_o 发表于 2021-7-20 21:12
"论一只爬虫的自我修养" ?nb,哈哈哈,自己慢慢修吧~
前面学习我都感觉还好,到了爬虫这,因为很多网站都变化了,学习就开始需要高人点拨了,这个时候自学有鸭梨。 jxd12345 发表于 2021-7-20 19:41
能不能发一下你的代码
import urllib.request
import urllib.parse
import re
from bs4 import BeautifulSoup
def main():
keyword = input("请输入关键词:")
keyword = urllib.parse.urlencode({"word":keyword})
response = urllib.request.urlopen("http://baike.baidu.com/search/word?%s" % keyword)
html = response.read()
soup = BeautifulSoup(html, "html.parser")
for each in soup.find_all(href=re.compile("view")):
content = ''.join()
url2 = ''.join(["http://baike.baidu.com", each["href"]])
response2 = urllib.request.urlopen(url2)
html2 = response2.read()
soup2 = BeautifulSoup(html2, "html.parser")
if soup2.h2:
content = ''.join()
content = ''.join()
print(content)
if __name__ == "__main__":
main() jxd12345 发表于 2021-7-20 19:41
能不能发一下你的代码
import urllib.request
import urllib.parse
import re
from bs4 import BeautifulSoup
def main():
keyword = input("请输入关键词:")
keyword = urllib.parse.urlencode({"word":keyword})
response = urllib.request.urlopen("http://baike.baidu.com/search/word?%s" % keyword)
html = response.read()
soup = BeautifulSoup(html, "html.parser")
for each in soup.find_all(href=re.compile("view")):
content = ''.join()
url2 = ''.join(["http://baike.baidu.com", each["href"]])
response2 = urllib.request.urlopen(url2)
html2 = response2.read()
soup2 = BeautifulSoup(html2, "html.parser")
if soup2.h2:
content = ''.join()
content = ''.join()
print(content)
if __name__ == "__main__":
main() jxd12345 发表于 2021-7-20 19:41
能不能发一下你的代码
我直接搬答案过来,由于百度网页很多信息已更改,这个直接运行将无响应。
import urllib.request
import urllib.parse
import re
from bs4 import BeautifulSoup
def main():
keyword = input("请输入关键词:")
keyword = urllib.parse.urlencode({"word":keyword})
response = urllib.request.urlopen("http://baike.baidu.com/search/word?%s" % keyword)
html = response.read()
soup = BeautifulSoup(html, "html.parser")
for each in soup.find_all(href=re.compile("view")):
content = ''.join()
url2 = ''.join(["http://baike.baidu.com", each["href"]])
response2 = urllib.request.urlopen(url2)
html2 = response2.read()
soup2 = BeautifulSoup(html2, "html.parser")
if soup2.h2:
content = ''.join()
content = ''.join()
print(content)
if __name__ == "__main__":
main() jxd12345 发表于 2021-7-20 19:41
能不能发一下你的代码
我直接搬答案过来,由于百度网页很多信息已更改,这个直接运行将无响应。
import urllib.request
import urllib.parse
import re
from bs4 import BeautifulSoup
def main():
keyword = input("请输入关键词:")
keyword = urllib.parse.urlencode({"word":keyword})
response = urllib.request.urlopen("http://baike.baidu.com/search/word?%s" % keyword)
html = response.read()
soup = BeautifulSoup(html, "html.parser")
for each in soup.find_all(href=re.compile("view")):
content = ''.join()
url2 = ''.join(["http://baike.baidu.com", each["href"]])
response2 = urllib.request.urlopen(url2)
html2 = response2.read()
soup2 = BeautifulSoup(html2, "html.parser")
if soup2.h2:
content = ''.join()
content = ''.join()
print(content)
if __name__ == "__main__":
main() 童鞋们,由于百度百科原网页的变化,也经过我自己这几天自己的缓慢摸索和不断尝试,终于尝试出了可以运行的代码,我在源代码的基础上更改了下,内容如下:
import urllib.request
import urllib.parse
import re
from bs4 import BeautifulSoup
def main():
keyword = input("请输入关键词:")
keyword = urllib.parse.urlencode({"word":keyword})
response = urllib.request.urlopen("https://baike.baidu.com/search/word?%s" % keyword) #此处将原网址http更新为https
html = response.read()
soup = BeautifulSoup(html, "html.parser")
print(soup.h2.text) # 此处增加了打印标题
print("\n") # 隔一行好看些
for each in soup.find_all(property=re.compile("description")): # 此处增加打印简介
print(each['content'])
print("\n下边打印相关链接\n")
for each in soup.find_all(href=re.compile("viewPageContent")): #此处将原来的“view”更改为“viewPageContent”
content = ''.join()
url2 = ''.join(["https://baike.baidu.com", each["href"]]) #此处将原网址http更新为https
response2 = urllib.request.urlopen(url2)
html2 = response2.read()
soup2 = BeautifulSoup(html2, "html.parser")
if soup2.h2:
content = ''.join()
content = ''.join()
print(content)
if __name__ == "__main__":
main() Twilight6 发表于 2021-7-21 09:41
看下这里吧,大致是这么个意思:
[已解决] 055课 爬百度百科“网络爬虫”的词条 问题
由于百度百科原网页的变化,也经过我自己这几天自己的缓慢摸索和不断尝试,终于尝试出了可以运行的代码,我在源代码的基础上更改了下,内容如下:
import urllib.request
import urllib.parse
import re
from bs4 import BeautifulSoup
def main():
keyword = input("请输入关键词:")
keyword = urllib.parse.urlencode({"word":keyword})
response = urllib.request.urlopen("https://baike.baidu.com/search/word?%s" % keyword) #此处将原网址http更新为https
html = response.read()
soup = BeautifulSoup(html, "html.parser")
print(soup.h2.text) # 此处增加了打印标题
print("\n") # 隔一行好看些
for each in soup.find_all(property=re.compile("description")): # 此处增加打印简介
print(each['content'])
print("\n下边打印相关链接\n")
for each in soup.find_all(href=re.compile("viewPageContent")): #此处将原来的“view”更改为“viewPageContent”
content = ''.join()
url2 = ''.join(["https://baike.baidu.com", each["href"]]) #此处将原网址http更新为https
response2 = urllib.request.urlopen(url2)
html2 = response2.read()
soup2 = BeautifulSoup(html2, "html.parser")
if soup2.h2:
content = ''.join()
content = ''.join()
print(content)
if __name__ == "__main__":
main() 童鞋们,由于百度百科原网页的变化,也经过我自己这几天自己的缓慢摸索和不断尝试,终于尝试出了可以运行的代码,我在源代码的基础上更改了下,内容如下:
import urllib.request
import urllib.parse
import re
from bs4 import BeautifulSoup
def main():
keyword = input("请输入关键词:")
keyword = urllib.parse.urlencode({"word":keyword})
response = urllib.request.urlopen("https://baike.baidu.com/search/word?%s" % keyword) #此处将原网址http更新为https
html = response.read()
soup = BeautifulSoup(html, "html.parser")
print(soup.h2.text) # 此处增加了打印标题
print("\n") # 隔一行好看些
for each in soup.find_all(property=re.compile("description")): # 此处增加打印简介
print(each['content'])
print("\n下边打印相关链接\n")
for each in soup.find_all(href=re.compile("viewPageContent")): #此处将原来的“view”更改为“viewPageContent”
content = ''.join()
url2 = ''.join(["https://baike.baidu.com", each["href"]]) #此处将原网址http更新为https
response2 = urllib.request.urlopen(url2)
html2 = response2.read()
soup2 = BeautifulSoup(html2, "html.parser")
if soup2.h2:
content = ''.join()
content = ''.join()
print(content)
if __name__ == "__main__":
main() chuangyuemx 发表于 2021-7-24 20:40
童鞋们,由于百度百科原网页的变化,也经过我自己这几天自己的缓慢摸索和不断尝试,终于尝试出了可以运行的 ...
打印的标题内容不正确。
页:
[1]