python - What is the most efficient & pythonic way to recode a pandas column? -
i'd 'anonymize' or 'recode' column in pandas dataframe. what's efficient way so? wrote following, seems there's built-in function or better way.
dataset = dataset.sample(frac=1).reset_index(drop=false) # reorders dataframe randomly (helps anonymization, since order have meaning) # make dictionary of old , new values value_replacer = 1 values_dict = {} unique_val in dataset[var].unique(): values_dict[unique_val] = value_replacer value_replacer += 1 # replace old values new k, v in values_dict.items(): dataset[var].replace(to_replace=k, value=v, inplace=true)
iiuc want factorize values:
dataset[var] = pd.factorize(dataset[var])[0] + 1
demo:
in [2]: df out[2]: col 0 aaa 1 aaa 2 bbb 3 ccc 4 ddd 5 bbb in [3]: df['col'] = pd.factorize(df['col'])[0] + 1 in [4]: df out[4]: col 0 1 1 1 2 2 3 3 4 4 5 2
Comments
Post a Comment