鱼C论坛

 找回密码
 立即注册
查看: 971|回复: 6

[已解决]统计文字中的单词数量并按出现次数排序(要求:尽量使用正则表达式)

[复制链接]
发表于 2020-5-21 16:31:27 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
本帖最后由 欧德奈瑞 于 2020-5-21 19:19 编辑

现在需要统计若干段文字(英文)中的单词数量,并且还需统计每个单词出现的次数。

注1:单词之间以空格(1个或多个空格)为间隔。
注2:忽略空行或者空格行。

基本版:
统计时,区分字母大小写,且不删除指定标点符号。

进阶版:

1、统计前,需要从文字中删除指定标点符号!.,:*?。 注意:所谓的删除,就是用1个空格替换掉相应字符。
2、统计单词时需要忽略单词的大小写。
输入说明
若干行英文,最后以!!!!!为结束。

输出说明
单词数量
出现次数排名前10的单词(次数按照降序排序,如果次数相同,则按照键值的字母升序排序)及出现次数。

输入样例1
  1. failure is probably the fortification in your pole

  2. it is like a peek your wallet as the thief when you
  3. are thinking how to spend several hard-won lepta

  4. when you are wondering whether new money it has laid
  5. background because of you then at the heart of the

  6. most lax alert and most low awareness and left it

  7. godsend failed
  8. !!!!!
复制代码

输出样例1
  1. 46
  2. the=4
  3. it=3
  4. you=3
  5. and=2
  6. are=2
  7. is=2
  8. most=2
  9. of=2
  10. when=2
  11. your=2
复制代码

输入样例2
  1. Failure is probably The fortification in your pole!

  2. It is like a peek your wallet as the thief when You
  3. are thinking how to. spend several hard-won lepta.

  4. when yoU are? wondering whether new money it has laid
  5. background Because of: yOu?, then at the heart of the
  6. Tom say: Who is the best? No one dare to say yes.
  7. most lax alert and! most low awareness and* left it

  8. godsend failed
  9. !!!!!
复制代码

输出样例2
  1. 54
  2. the=5
  3. is=3
  4. it=3
  5. you=3
  6. and=2
  7. are=2
  8. most=2
  9. of=2
  10. say=2
  11. to=2
复制代码
最佳答案
2020-5-21 18:49:41
进阶版
  1. import re
  2. import collections


  3. text = '''Failure is probably The fortification in your pole!

  4. It is like a peek your wallet as the thief when You
  5. are thinking how to. spend several hard-won lepta.

  6. when yoU are? wondering whether new money it has laid
  7. background Because of: yOu?, then at the heart of the
  8. Tom say: Who is the best? No one dare to say yes.
  9. most lax alert and! most low awareness and* left it

  10. godsend failed
  11. !!!!!'''

  12. text=re.sub(r'[^\w\s]','',text)
  13. print(text)        
  14. frequency = collections.Counter(map(str.lower,text.split()))

  15. l=sorted(list(frequency.items()),key=lambda x:(-x[1],x[0]))[:10]
  16. print(len(frequency))
  17. for i in l:
  18.     print(i[0],'=',i[1])
复制代码
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2020-5-21 18:43:33 | 显示全部楼层
基本版
  1. import re
  2. import collections


  3. text = '''failure is probably the fortification in your pole

  4. it is like a peek your wallet as the thief when you
  5. are thinking how to spend several hard-won lepta

  6. when you are wondering whether new money it has laid
  7. background because of you then at the heart of the

  8. most lax alert and most low awareness and left it

  9. godsend failed
  10. !!!!!'''

  11. text=re.sub(r'[^\w\s]','',text)         
  12. frequency = collections.Counter(text.split())
  13. l=sorted(list(frequency.items()),key=lambda x:(-x[1],x[0]))[:10]
  14. print(len(frequency))
  15. for i in l:
  16.     print(i[0],'=',i[1])
复制代码
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 1 反对 0

使用道具 举报

发表于 2020-5-21 18:49:41 | 显示全部楼层    本楼为最佳答案   
进阶版
  1. import re
  2. import collections


  3. text = '''Failure is probably The fortification in your pole!

  4. It is like a peek your wallet as the thief when You
  5. are thinking how to. spend several hard-won lepta.

  6. when yoU are? wondering whether new money it has laid
  7. background Because of: yOu?, then at the heart of the
  8. Tom say: Who is the best? No one dare to say yes.
  9. most lax alert and! most low awareness and* left it

  10. godsend failed
  11. !!!!!'''

  12. text=re.sub(r'[^\w\s]','',text)
  13. print(text)        
  14. frequency = collections.Counter(map(str.lower,text.split()))

  15. l=sorted(list(frequency.items()),key=lambda x:(-x[1],x[0]))[:10]
  16. print(len(frequency))
  17. for i in l:
  18.     print(i[0],'=',i[1])
复制代码
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 1 反对 0

使用道具 举报

 楼主| 发表于 2020-5-21 18:52:58 | 显示全部楼层

大佬知道如何接收该题目这样长的输入样例吗?而且您代码的输出等号左右两边带空格,和输出样例不符
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-21 19:05:13 | 显示全部楼层
欧德奈瑞 发表于 2020-5-21 18:52
大佬知道如何接收该题目这样长的输入样例吗?而且您代码的输出等号左右两边带空格,和输出样例不符

已经按你要求更改
  1. import re
  2. import collections


  3. text = input()
  4. text=re.sub(r'[^\w\s]','',text)
  5. print(text)        
  6. frequency = collections.Counter(map(str.lower,text.split()))

  7. l=sorted(list(frequency.items()),key=lambda x:(-x[1],x[0]))[:10]
  8. print(len(frequency))
  9. for i in l:
  10.     print('%s=%d'%(i[0],i[1]))
复制代码
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 1 反对 0

使用道具 举报

 楼主| 发表于 2020-5-21 19:15:48 | 显示全部楼层

虽然大佬的代码还是没法帮我过这道题,但是还是感谢大佬的思路指引。所幸我已经用循环过了这道题了。所以就把大佬的这段代码设为最佳答案吧
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-21 19:21:03 | 显示全部楼层
ouyunfu 发表于 2020-5-21 19:05
已经按你要求更改

对了,大佬下次接收这种特殊结尾的样本输入的时候可以考虑使用代码
  1. text = ''
  2. while True:
  3.     strs = input()
  4.     if strs != '!!!!!':
  5.         text += '\n'
  6.         text += strs
  7.     else:
  8.         break
复制代码

想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-3-29 03:57

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表