53讲最后一题
这个是我自己写的 53讲最后一题,有没有大佬帮我改正一下代码import urllib.request
import chardet
import os
def main():
f = open('urls.txt')
for each_line in f:
tuple = (numbers,contents) = each_line.split('.',1)
number = tuple
content = tuple
save(number,content)
f.close()
def save(number,content):
f1 = open('url_%s.txt' %number,'w')
f1.write(get(content))
f1.close()
def get(content):
response = urllib.request.urlopen(content)
html = response.read()
encode = chardet.detect(content)['encoding']
if encode == 'GB231':
ncode = 'GBK'
return html.decode(encode,'ignore')
if __name__ =='__main__':
main() {:5_99:} 本帖最后由 suchocolate 于 2020-12-7 17:56 编辑
chardet.detect不接收字符串,接收byte:
>>> chardet.detect('测试')
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
chardet.detect('测试')
File "C:\d\Program Files\python3\Lib\site-packages\chardet\__init__.py", line 34, in detect
'{0}'.format(type(byte_str)))
TypeError: Expected object of type bytes or bytearray, got: <class 'str'>
>>> bst = '测试'.encode('utf-8')
>>> chardet.detect(bst)
{'encoding': 'utf-8', 'confidence': 0.7525, 'language': ''}
>>>
你把curls.txt的内容也发出来,帮你分析分析如何改。 多谢大佬!!{:10_284:} {:10_266:}自己顶贴
页:
[1]