python - Parsing a column of JSON strings -
i have tab seperated flatfile, 1 column of json data stored string, e.g.
col1 col2 col3 1491109818 2017-08-02 00:00:09.250 {"type":"tipper"} 1491110071 2017-08-02 00:00:19.283 {"type":"hgv"} 1491110798 2017-08-02 00:00:39.283 {"type":"tipper"} 1491110798 2017-08-02 00:00:39.283 \n ...
what want load table pandas dataframe, , col3 change data string information type key. there no json or json without type key want return none.
e.g.
col1 col2 col3 1491109818 2017-08-02 00:00:09.250 tipper 1491110071 2017-08-02 00:00:19.283 hgv 1491110798 2017-08-02 00:00:39.283 tipper 1491110798 2017-08-02 00:00:39.283 none ...
the way can think iterrows, slow when dealing large files.
for index, row in df.iterrows(): try: df.loc[index, 'col3'] = json.loads(row['col3'])['type'] except: df.loc[index, 'col3'] = none
any suggestions on quicker approach?
using np.vectorize
, json.loads
import json def foo(x): try: return json.loads(x)['type'] except (valueerror, keyerror): return none v = np.vectorize(foo) df.col3 = v(df.col3)
note never recommended use bare except
, can inadvertently catch , drop errors didn't mean to.
df col1 col2 col3 0 1491109818 2017-08-02 00:00:09.250 tipper 1 1491110071 2017-08-02 00:00:19.283 hgv 2 1491110798 2017-08-02 00:00:39.283 tipper 3 1491110798 2017-08-02 00:00:39.283 none
Comments
Post a Comment