python - Split dataframe column into two columns based on delimiter -
i preprocessing text classification, , import dataset this:
dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 2)
dataset
prints on terminal:
lyrics,classification 0 should have known better girl yo... 1 can shake apple off apple tree\nshak... 2 it's been hard day's night\nand i've been wo... 3 michelle, ma belle\nthese words go to...
however, when inspect variable dataset
closer using spyder
, see have 1 column, instead of desired 2 columns.
considering lyrics have commas , "," delimiter not work,
how correct dataframe above in order have:
1) 1 column lyrics
2) 1 column classification
with correspondent data each row?
if lyrics not contain commas (they do), can use read_csv
delimiter=','
.
however, if not option, use str.rsplit
:
dataset.iloc[:, 0].str.rsplit(',', expand=true)
df lyrics,classification 0 should have known better girl yo... 1 can shake an...,0 2 it's been hard day's night...,0 df = df.iloc[:, 0].str.rsplit(',', 1, expand=true) df.columns = ['lyrics', 'classification'] df lyrics classification 0 should have known better girl yo... 0 1 can shake an... 0 2 it's been hard day's night... 0
Comments
Post a Comment