python爬取新闻评论遇到的信息提取问题

yycf · 发表于 2019-5-1 09:44:04

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

程序如下，我想从这个搜狐的新闻页面上爬取全部评论，包括用户id，用户名，评论内容等。解析出来是一个json字典格式但是想要把里面的全部内容取出来会报以下错误
错误：
Traceback (most recent call last):
File "C:\Users\NO-016\Desktop\搜狐评论.py", line 21, in <module>
for item in target['jsonObject']['comments']['passport']:
TypeError: list indices must be integers or slices, not str
程序
import urllib.request
import urllib.parse
import json

data = {}
data['title']= '皇室继承仅剩3人，日本再热议允许“女天皇”'
data['url']= 'http://www.sohu.com/a/311150294_115479'
data = urllib.parse.urlencode(data).encode('utf-8')

url = 'http://apiv2.sohu.com/api/comment/list?callback=jQuery1124043141076505940457_1556620201815&page_size=10&topic_id=14988794&page_no=2&source_id=mp_311150294&_=1556620201852'

req = urllib.request.urlopen(url,data)

html = req.read().decode('utf-8')

target = json.loads(html)

for item in target['jsonObject']['comments']['passport']:
time = target['jsonObject']['comments']['passport'][item]['user_id']
content = target['jsonObject']['comments']['passport'][item]['nickname']
print(user_id,nickname)

_谪仙 · 发表于 2019-5-1 15:26:55

你用法错了；comments 为一个列表，不是字典，

wp231957 · 发表于 2019-5-2 09:03:41

# coding: utf-8
import urllib.request
import urllib.parse
import json
data = {}
data['title']= '皇室继承仅剩3人，日本再热议允许“女天皇”'
data['url']= 'http://www.sohu.com/a/311150294_115479'
data = urllib.parse.urlencode(data).encode('utf-8')
url = 'http://apiv2.sohu.com/api/comment/list?callback=jQuery1124043141076505940457_1556620201815&page_size=10&topic_id=14988794&page_no=2&source_id=mp_311150294&_=1556620201852'
req = urllib.request.urlopen(url,data)
html = req.read().decode('utf-8')
target = json.loads(html)
newlst=[]
for x in target["jsonObject"]["comments"]:
tmp={}
for z in x.keys():
if z=="passport":
tmp["user_id"]=x[z]["user_id"]
tmp["nick_name"]=x[z]["nickname"]
newlst.append(tmp)
print(newlst)

复制代码

e:\pytest>python ex28.py
[{'user_id': 10116590, 'nick_name': '胖胖的茄子帝国'}, {'user_id': 10116052, 'nick_name': 'ZHO_Jojo-'}, {'user_id': 10117556, 'nick_name': 'linlin悦儿'}, {'user_id': 10115405, 'nick_name': '鱼可以飞么'}, {'user_id': 10115964, 'nick_name': '想个名咋这么难'}, {'user_id': 10117088, 'nick_name': '岛分之一'}, {'user_id': 10116385, 'nick_name': '暴走里脊-Zubin'}, {'user_id': 10116605, 'nick_name': '元气觉醒Vv'}, {'user_id': 10115965, 'nick_name': 'DansonMi'}, {'user_id': 10116565, 'nick_name': '我的女王大人你可安好'}]

你直接发这一个帖子就好，何必还发https://fishc.com.cn/thread-134991-1-1.html 这么一个错误得帖子
直接都结帖好了

账号		自动登录	找回密码
密码			立即注册