¹ØÓÚpandasµÄÓ¦ÓÃ
ÎÒÏëÒª½«Õâ¸öÎļþÄڵĻù±¾ÐÅÏ¢ÄÇÒ»ÁзֳÉÈýÁРѧÀú ¹¤×÷¾Ñé ÒòΪн×ÊÇ°ÃæÒѾ¸ø³ö
ÎҸоõÖØ¸´ÁËÏë°ÑËü·Ö³öÀ´É¾³ýµô
ÇëÎʸÃÔõô×ö£¿ Excelµã¼¸ÏÂÊó±ê£¬Ò»¸ö·ÖÁвÙ×÷£¬¾Í¿ÉÒԸ㶨ÁË¡£¡£ ±ØÐëÓÃpandasô£¿×Ô¼ºÏëÁ·Ï°Ò»Ï£¿ °¢Ææ_o ·¢±íÓÚ 2021-4-2 21:12
Excelµã¼¸ÏÂÊó±ê£¬Ò»¸ö·ÖÁвÙ×÷£¬¾Í¿ÉÒԸ㶨ÁË¡£¡£ ±ØÐëÓÃpandasô£¿×Ô¼ºÏëÁ·Ï°Ò»Ï£¿
ÎÒÒªÓóÌÐòÀ´ÊµÏÖ
ÊǸö×÷Òµ ˽¤Ï¤ê ·¢±íÓÚ 2021-4-2 21:15
ÎÒÒªÓóÌÐòÀ´ÊµÏÖ
ÊǸö×÷Òµ
Ì«¾ÃûÓÃpandas¡£¡£¸ãÁËһСʱ¡£¡£
¾ÍÕâÑù°É£¬
import pandas as pd
df = pd.read_csv('lagou_recruitment.csv')
# ·ÖÁвÙ×÷
work_year = pd.Series(df['»ù±¾ÒªÇó'].apply(lambda s : s.split(' '))).rename('¹¤×÷ÄêÏÞ') # ¹¤×÷ÄêÏÞ
edu_bgd = pd.Series(df['»ù±¾ÒªÇó'].apply(lambda s : s.split(' '))).rename('ѧÀú') # ѧÀú
# print(type(work_year))
# print(work_year, edu_bgd)
# È¥µô²»ÐèÒªµÄÁÐ
df_cut = df.drop(columns=['Unnamed: 0', '»ù±¾ÒªÇó'])
# print(df_cut.columns)
# °ÑÒªµÄÁУ¬Æ´ÆðÀ´
s = pd.concat(], work_year, edu_bgd,
df_cut[['¹«Ë¾×´¿ö', '¸Úλ¼¼ÄÜ', '¹«Ë¾¸£Àû']] ], axis=1)
# ÖØÃüÃû ÁÐÃû
print(s.columns)
new_cols_name = list(df_cut.columns)[:5] + ['¹¤×÷ÄêÏÞ', 'ѧÀú'] + list(df_cut.columns)
print(new_cols_name)
df_re = s.rename(columns=dict(zip(s.columns, new_cols_name))) # ¶ÔÁÐÖØÃüÃû£¬ÐèÒªÓà ×ֵ䡣¡£
# дÈëcsv
df_re.to_csv('lagou_r_new.csv', index=False, encoding='utf-8-sig') # -sig ½â¾öÂÒÂëÎÊÌâ
print(df_re)
°¢Ææ_o ·¢±íÓÚ 2021-4-3 00:20
Ì«¾ÃûÓÃpandas¡£¡£¸ãÁËһСʱ¡£¡£
¾ÍÕâÑù°É£¬
ºÃµÄлл
°¢Ææ_o ·¢±íÓÚ 2021-4-3 00:20
Ì«¾ÃûÓÃpandas¡£¡£¸ãÁËһСʱ¡£¡£
¾ÍÕâÑù°É£¬
import pandas as pd
data = pd.read_csv(r'D:\lagou_recruitment.csv')
data.head()
data.columns = ['Unnamed', '¸ÚλÃû³Æ', '¹«Ë¾Ãû³Æ', '³ÇÊÐ', 'µØµã', 'н×Ê', '»ù±¾ÒªÇó', '¹«Ë¾×´¿ö', '¸Úλ¼¼ÄÜ', '¹«Ë¾¸£Àû']
data = data.drop(['Unnamed'],axis=1)
data.head()
ΪʲôÎÒÕâÀï»á±¨´í
»áÏÔʾ¡®»ù±¾ÒªÇó¡¯ Õâ¸ö´í
dups = data.duplicated()
print('Number of duplicate rows = %d' % (dups.sum()))
print('Number of rows before discarding duplicates = %d' % (data.shape))
data2 = data.drop_duplicates()#ɾ³ýÖØ¸´Öµ
print('Number of rows after discarding duplicates = %d' % (data2.shape))
work_year = pd.Series(data2['»ù±¾ÒªÇó'].apply(lambda s : s.split(' '))).rename('¹¤×÷ÄêÏÞ') # ¹¤×÷ÄêÏÞ
edu_bgd = pd.Series(data2['»ù±¾ÒªÇó'].apply(lambda s : s.split(' '))).rename('ѧÀú') # ѧÀú
print(type(work_year))
print(work_year, edu_bgd) °¢Ææ_o ·¢±íÓÚ 2021-4-3 00:20
Ì«¾ÃûÓÃpandas¡£¡£¸ãÁËһСʱ¡£¡£
¾ÍÕâÑù°É£¬
ÎÒ×öºÃÁË
²»ÖªµÀΪʲô ÓÐʱ»á±¨´í
ÓÐʱ²»»á
ллÁË
Ò³:
[1]