Python loop through a list to find a substring match in another string from a column in dataframe then return that matched substring in a new column -
i programming in python 3.6 on spyder 3.2.3 on macos sierra 10.12.6.
i have column in dataframe df
containing list of towns in australia , other information. column of interest suburbs
df["suburbs"] apollo bay (tas.) apollo bay (vic.) apoinga act remainder - belconni
i have list states
containing states in australia.
states = ["nsw", "vic", "tas", "act", "sa", "wa"]
my objective see whether suburb df["suburbs"]
contains state list states
, if yes, return state in column df["state"]
.
so, @ moment solution use loop , if statement, however, reason loop , if statement returns not found
every line if matches. current loop , if statement below:
for suburb in df["suburbs"].str.upper(): state in states: if state in suburb: df["state"] = state else: df["state"] = "not found"
and returns
not found not found not found not found
another thing noticed on variable explorer section of spyder, above code creates 2 variables suburb
, state
values act remainder - belconni
, wa
, respectively. seems pick last values both dataframe column , list.
however, if not create new column state
, use print
function see if substring matches, shows works. code below:
for suburb in test["suburbs"].str.upper(): state in states: if state in suburb: print(suburb, state)
and result is:
apollo bay (tas.) tas apollo bay (vic.) vic act remainder - belconni act
it skips 1 doesn't have match. add additional else
statement print not found
result not found
. can please me understand wrong here , why? it's pretty frustrating because seems me simple task.
thank much.
Comments
Post a Comment