datetime - convert string with nanosecond into timestamp in spark -
is there way convert timestamp value nano seconds timestamp in spark. input csv file , timstamp value of format 12-12-2015 14:09:36.992415+01:00
. code tried.
val date_raw_data = list((1, "12-12-2015 14:09:36.992415+01:00")) val dateraw_df = sc.parallelize(date_raw_data).todf("id", "timestamp_value") val ts = unix_timestamp($"timestamp_value", "mm-dd-yyyy hh:mm:ss.ffffffz").cast("double").cast("timestamp") val date_df = dateraw_df.withcolumn("timestamp_conv", ts).show(false)
the output is
+---+-----------------------+---------------------+ |id |timestamp_value |timestamp_conv | +---+-----------------------+---------------------+ |1 |12-12-2015 14:09:36.992|null | +---+-----------------------+---------------------+
i able convert time stamp millisecond using format mm-dd-yyyy hh:mm:ss.sss
. trouble nano second , timezone formats.
unix_timestamp
won't here. event if parse string (afaik simpledateformat
doesn't provide required formats), unix_timestamp
has second precision (emphasis mine):
def unix_timestamp(s: column, p: string): column
convert time string given pattern (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpledateformat.html]) unix time stamp (in seconds), return null if fail.
you have create own function parse data. rough idea:
import org.apache.spark.sql.functions._ import org.apache.spark.sql.column def to_nano(c: column) = { val r = "([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2})(\\.[0-9]*)(.*)$" // seconds part (unix_timestamp( concat( regexp_extract($"timestamp_value", r, 1), regexp_extract($"timestamp_value", r, 3) ), "mm-dd-yyyy hh:mm:ssxxx" ).cast("decimal(38, 9)") + // subsecond part regexp_extract($"timestamp_value", r, 2).cast("decimal(38, 9)")).alias("value") } seq("12-12-2015 14:09:36.992415+01:00").todf("timestamp_value") .select(to_nano($"timestamp_column").cast("timestamp")) .show(false) // +--------------------------+ // |value | // +--------------------------+ // |2014-12-28 14:09:36.992415| // +--------------------------+
Comments
Post a Comment