马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
import requests,re
#url ='https://maoyan.com/board/4'
def get_one_page(url):
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
'cookie' : '__mta=154442631.1593241361941.1593241556380.1593241598233.10; uuid_n_v=v1; uuid=2C6E1440B84411EAB7FDDF490BC946C07143E4D320224FAD8732A16C4F44A135; _csrf=9c3390568dd99073365b26b5ffcd129611ecfabbd07045a23cd8e278637f52a4; __guid=17099173.2087594309519851500.1593241358937.2642; Hm_lvt_703e94591e87be68cc8da0da7cbd0be2=1593241360; _lxsdk_cuid=172f495d53ac8-023df9ee10b5bf-376b4502-1fa400-172f495d53a6d; _lxsdk=2C6E1440B84411EAB7FDDF490BC946C07143E4D320224FAD8732A16C4F44A135; mojo-uuid=3c61b53790f9fbab0971a53489505d9b; mojo-session-id={"id":"2a22304ae494ac0049ab4e1180ef1cf0","time":1593241361961}; __mta=49531289.1593241413123.1593241413123.1593241413123.1; monitor_count=11; mojo-trace-id=16; Hm_lpvt_703e94591e87be68cc8da0da7cbd0be2=1593241598; _lxsdk_s=172f495d53c-f36-b5-fdd%7C%7C23'
}
response = requests.get(url,headers=headers)
if response.status_code == 200:
return response.text
'''def parse_one_page(html):
pattern = re.compile('<dd>.*?>(.*?)</i>.*?data-src="(.*?)".*?name.*?a.*?>(.*?)</a>.*?star.*?>(.*?)</p>.*?releasetime.*?>(.*?)</p>.*?integer.*?>(.*?)</i>.*?fraction.*?>(.*?)</i>.*?</dd>',re.S)
items = re.findall(pattern,html)
for item in items:
yield{
'indes':item[0],
'image':item[1],
'title':item[2],
'actor':item[3].strip()[3:] if len(item[3]) > 3 else '',
'time' :item[4].strip()[5:] if len(item[4]) > 5 else '',
'score':item[5].strip() +item[6].strip()
}
'''
def main():
url='https://maoyan.com/board/4'
html = get_one_page(url)
#print(html)
pattern = re.compile('<dd>.*?>(.*?)</i>.*?data-src="(.*?)".*?name.*?a.*?>(.*?)</a>.*?star.*?>(.*?)</p>.*?releasetime.*?>(.*?)</p>.*?integer.*?>(.*?)</i>.*?fraction.*?>(.*?)</i>.*?</dd>',re.S)
items = re.findall(pattern,html)
for item in items:
yield{
'indes':item[0],
'image':item[1],
'title':item[2].strip(),
'actor':item[3].strip(),
'time' :item[4].strip(),
'score':item[5]+item[6],
}
a = yield
print(a)
main()
上面这个代码输出运行没有输出,或者我怎么输出yield这个字典呢。但是我如果在def main()里面直接输出items或者for item in items: print item 都是有输出的。求大佬帮助啊
输出字典是空白的
函数yiled 之后是一个generator, 你可以for循环遍历它,然后在处理;
for item in main():
print(item)
或者 result = list(main() ),得到所有字典的列表
|