pandasµÄÓ¦ÓÃ
1.ÎÒÏëҪȥ³ý¹«Ë¾µØµãÄÇÒ»ÁÐÏÂÔªËØµÄÀ¨ºÅ2.Éú³ÉÏÔʾÐÐҵƽ¾ùн×Ê
²»ÖªµÀ¸ÃÔõô×ö£¬Ï£Íû´óÀÐÄܽâ´ð ±¾Ìû×îºóÓÉ °¢Ææ_o ÓÚ 2021-4-3 18:06 ±à¼
ÓÖÊÇÄã¡£¡£pandas¿ÉÒÔ³ö¸öϵÁÐÁË£¬¹þ¹þ
ºÃ£¬¿´´úÂ룺
# ÐÂÐèÇó£º
# 1.È¥µô µØµã ÁÐÀïµÄ·½À¨ºÅ
# 2.¼ÆËã ÐÐҵƽ¾ù¹¤×Ê
df = pd.read_csv('lagou_r_new.csv')
print(df.head())
print(df.columns)
# 1. È¥µô ¹«Ë¾ ÁÐÀïµÄ·½À¨ºÅ
r = df['µØµã'].apply(lambda s: s.replace('[','').replace(']', ''))
print(r.head())
print( pd.concat( ], r , df] ], axis=1).head() )
# 2.¼ÆËã ÐÐҵƽ¾ù¹¤×Ê
# ˼·£º
# Òò н×Ê ÊÇ 8k-15k µÄÐÎʽ£¬¿ÉÒÔ½« k Ìæ»»Îª 000 £¬ - Ìæ»»Îª + £»
# È»ºó½øÐÐ eval() ÇóºÍ£¬¼´ ±ä³É eval(¡®8000¡¯+¡®15000¡¯) , ÔÙ ³ýÒÔ 2 ¼´ ÿ¸ö¸ÚλµÄƽ¾ùн×Ê
df_rep = df['н×Ê'].apply(lambda s: eval(s.replace('k','000').replace('K', '000').replace('-', '+'))) # ·µ»ØµÄÊÇ Áбí/ÐòÁУ¨Series)£¡
print(df_rep.head()) #
df_rep = pd.DataFrame(df_rep)
# hy = df['¹«Ë¾×´¿ö'].apply(lambda s: s.split(' / ')) # ÐÐÒµ·ÖÀà¡£¡£
# print(hy)
# avg = df_rep.groupby(hy)[['н×Ê']].mean()
# ÒÔ Çó ¸÷³ÇÊÐÆ½¾ùн×Ê ÎªÀý£¨ÐÐÒµÁл¹Ã»ÇåÏ´ºÃ£¬Êý¾ÝÔàÂÒ£©
avg = df_rep.groupby(by=df['³ÇÊÐ'])['н×Ê'].mean().round()
print(avg.sort_values(ascending=False))
# ÎÒÀÕ¸öÈ¥£¬±±ÉÏÉî 40K+ , ÔÀ´ Äêн40Íò ÊÇÕâÑùÀ´µÄ£¬¹þ¹þ
°¢Ææ_o ·¢±íÓÚ 2021-4-3 18:03
ÓÖÊÇÄã¡£¡£pandas¿ÉÒÔ³ö¸öϵÁÐÁË£¬¹þ¹þ
ºÃ£¬¿´´úÂ룺
eval() ºó ÍüÁË ¡Â 2 ÁË¡£¡£
Ó¦¸ÃÊÇ df_rep = df['н×Ê'].apply(lambda s: (eval(s.replace('k','000').replace('K', '000').replace('-', '+'))/2))
ÄÇÕâÑù£¬¾Íû ¡°Äêн40Íò¡±ÁË£¬¶øÊÇ 20ÍòÁË£¬¹þ¹þ
{:10_297:} °¢Ææ_o ·¢±íÓÚ 2021-4-3 18:03
ÓÖÊÇÄã¡£¡£pandas¿ÉÒÔ³ö¸öϵÁÐÁË£¬¹þ¹þ
ºÃ£¬¿´´úÂ룺
ºÃµÄлл
ÎÒÑо¿Ï °¢Ææ_o ·¢±íÓÚ 2021-4-3 18:03
ÓÖÊÇÄã¡£¡£pandas¿ÉÒÔ³ö¸öϵÁÐÁË£¬¹þ¹þ
ºÃ£¬¿´´úÂ룺
Ñо¿Ò»Ï¸÷µØÇø×î¸ßн×ÊÊÇʲô¹¤×÷
Õâ¸öÎÊÌâ¸ÃÔõô×öÄØ£¿ ÏÈÀí½âÁË df['col'].apply() ºÍ ÄäÃûº¯Êý ÔõôÓ㬻¹ÓÐÇ°Ãæ Ôõô½øÐС°·ÖÁвÙ×÷¡±µÄ£¬Àí½âÁËÂð
ÔÙ¿´£ºdf_rep = df['н×Ê'].apply(lambda s: (eval(s.replace('k','000').replace('K', '000').replace('-', '+'))/2))
ÔÙÀí½â df_rep['salary'].groupby(by=df['city ? '])[['max_sal ? ']].max()# .min(), .sum(), .count(), .mean()
ps: ¿ÉÒÔ¶à²é ¹Ù·½Îĵµ£¬Óмòµ¥µÄʾÀý£¬×Ô¼º¿ÉÒÔÏÈÁ·Á·£¬Àí½âÀí½â£¬È»ºóÔÙÓ¦ÓÃÓ¦Óá£
{:10_297:} °¢Ææ_o ·¢±íÓÚ 2021-4-3 21:29
ÏÈÀí½âÁË df['col'].apply() ºÍ ÄäÃûº¯Êý ÔõôÓ㬻¹ÓÐÇ°Ãæ Ôõô½øÐС°·ÖÁвÙ×÷¡±µÄ£¬Àí½âÁËÂð
ÔÙ¿´£ºdf_r ...
²î²»¶à¶¼Àí½âÁË ÎÒÖªµÀÊÇʲôÓ÷¨ °¢Ææ_o ·¢±íÓÚ 2021-4-3 21:29
ÏÈÀí½âÁË df['col'].apply() ºÍ ÄäÃûº¯Êý ÔõôÓ㬻¹ÓÐÇ°Ãæ Ôõô½øÐС°·ÖÁвÙ×÷¡±µÄ£¬Àí½âÁËÂð
ÔÙ¿´£ºdf_r ...
¹Ù·½ÎĵµÔõô²é
ÎÒÒ»°ã¶¼ÊÇËÑ°Ù¶È Ë½¤Ï¤ê ·¢±íÓÚ 2021-4-3 22:03
¹Ù·½ÎĵµÔõô²é
ÎÒÒ»°ã¶¼ÊÇËѰٶÈ
ÔÚÏßÎĵµ¡¢API£¬Ö±½Ó ÓïÑÔÃû+Îĵµ£¬Èç Python3 Îĵµ
ÀëÏßÎĵµ£¬ÎÒÓõÄÊÇ zeal (zealdocs.org)
²»¹ý£¬Æäʵ vscode µÈ±à¼Æ÷£¬×Ô¼ºÒ²»á ¼ÓÔØ³ö ij¸ö·½·¨º¯ÊýµÄÎĵµ£¨¾«¼ò°æµÄ£©¡£
ipythonÒ²¿ÉÒԲ鿴¾«¼ò°æµÄ°ïÖúÎĵµ£¬Èç pd.DataFrame?
²»¹ý£¬ÏÈѧһ±é¡¶ÓÃPython½øÐÐÊý¾Ý·ÖÎö¡·ÕâÑùÀàËÆµÄ¾µäÊé¼®£¬
È»ºóÔÙ²éÎĵµ£¬»á±È½ÏÃ÷È·Òª²éʲô¡£
{:10_333:}
°¢Ææ_o ·¢±íÓÚ 2021-4-3 22:29
ÔÚÏßÎĵµ¡¢API£¬Ö±½Ó ÓïÑÔÃû+Îĵµ£¬Èç Python3 Îĵµ
ÀëÏßÎĵµ£¬ÎÒÓõÄÊÇ zeal (zealdocs.org)
ºÃµÄ
ÁúÎè¾ÅÌì ·¢±íÓÚ 2021-4-30 15:13
ÐǪ̈ÉÙˮһµã°É£¬Ð¡Ðı»·âŶ{:10_245:}
Ò³:
[1]