Lyi. 发表于 2020-12-7 16:23:50

53讲最后一题

这个是我自己写的 53讲最后一题,有没有大佬帮我改正一下代码

import urllib.request
import chardet
import os

def main():
    f = open('urls.txt')
    for each_line in f:
      tuple = (numbers,contents) = each_line.split('.',1)
      number = tuple
      content = tuple
      save(number,content)   
    f.close()

def save(number,content):
    f1 = open('url_%s.txt' %number,'w')
    f1.write(get(content))
    f1.close()

      
def get(content):
    response = urllib.request.urlopen(content)
    html = response.read()
      
    encode = chardet.detect(content)['encoding']
    if encode == 'GB231':
      ncode = 'GBK'
      
    return html.decode(encode,'ignore')
      


if __name__ =='__main__':
    main()

Lyi. 发表于 2020-12-7 16:24:41

{:5_99:}

suchocolate 发表于 2020-12-7 17:55:48

本帖最后由 suchocolate 于 2020-12-7 17:56 编辑

chardet.detect不接收字符串,接收byte:
>>> chardet.detect('测试')
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
    chardet.detect('测试')
File "C:\d\Program Files\python3\Lib\site-packages\chardet\__init__.py", line 34, in detect
    '{0}'.format(type(byte_str)))
TypeError: Expected object of type bytes or bytearray, got: <class 'str'>
>>> bst = '测试'.encode('utf-8')
>>> chardet.detect(bst)
{'encoding': 'utf-8', 'confidence': 0.7525, 'language': ''}
>>>
你把curls.txt的内容也发出来,帮你分析分析如何改。

Lyi. 发表于 2020-12-8 10:38:59

多谢大佬!!{:10_284:}

Lyi. 发表于 2020-12-8 19:33:02

{:10_266:}自己顶贴
页: [1]
查看完整版本: 53讲最后一题