python - What is the most efficient & pythonic way to recode a pandas column? -


i'd 'anonymize' or 'recode' column in pandas dataframe. what's efficient way so? wrote following, seems there's built-in function or better way.

dataset = dataset.sample(frac=1).reset_index(drop=false) # reorders dataframe randomly (helps anonymization, since order have meaning)  # make dictionary of old , new values value_replacer = 1 values_dict = {}    unique_val in dataset[var].unique():     values_dict[unique_val] = value_replacer     value_replacer += 1  # replace old values new k, v in values_dict.items():     dataset[var].replace(to_replace=k, value=v, inplace=true) 

iiuc want factorize values:

dataset[var] = pd.factorize(dataset[var])[0] + 1 

demo:

in [2]: df out[2]:    col 0  aaa 1  aaa 2  bbb 3  ccc 4  ddd 5  bbb  in [3]: df['col'] = pd.factorize(df['col'])[0] + 1  in [4]: df out[4]:    col 0    1 1    1 2    2 3    3 4    4 5    2 

Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -