Python loop through a list to find a substring match in another string from a column in dataframe then return that matched substring in a new column -
i programming in python 3.6 on spyder 3.2.3 on macos sierra 10.12.6.
i have column in dataframe df containing list of towns in australia , other information. column of interest suburbs
df["suburbs"] apollo bay (tas.) apollo bay (vic.) apoinga act remainder - belconni i have list states containing states in australia.
states = ["nsw", "vic", "tas", "act", "sa", "wa"] my objective see whether suburb df["suburbs"]contains state list states , if yes, return state in column df["state"].
so, @ moment solution use loop , if statement, however, reason loop , if statement returns not found every line if matches. current loop , if statement below:
for suburb in df["suburbs"].str.upper(): state in states: if state in suburb: df["state"] = state else: df["state"] = "not found" and returns
not found not found not found not found another thing noticed on variable explorer section of spyder, above code creates 2 variables suburb , state values act remainder - belconni , wa, respectively. seems pick last values both dataframe column , list.
however, if not create new column state , use print function see if substring matches, shows works. code below:
for suburb in test["suburbs"].str.upper(): state in states: if state in suburb: print(suburb, state) and result is:
apollo bay (tas.) tas apollo bay (vic.) vic act remainder - belconni act it skips 1 doesn't have match. add additional else statement print not found result not found. can please me understand wrong here , why? it's pretty frustrating because seems me simple task.
thank much.
Comments
Post a Comment