hive - Can we use spark session object without explicitly creating it, if Submit a job by spark-submit -
my question basic, code working fine. not clear these 2 points:
1) when submit pyspark job using spark-submit need create spark session object ? in script:
from pyspark.sql import sparksession,sqlcontext pyspark.conf import sparkconf spark = sparksession \ .builder \ .enablehivesupport() \ .appname("test") \ .getorcreate() print(spark) sqlcontext = sqlcontext(spark)
or can directly access spark session object in script out creating it.
from pyspark.sql import sparksession,sqlcontext pyspark.conf import sparkconf print(spark) -- can ***sc*** not sure using spark-2 sqlcontext = sqlcontext(spark)
and if spark session object available how can add config properties such below or how enable hive support.
spark = sparksession \ .builder \ .enablehivesupport() \ .config(conf=sparkconf().set("spark.driver.maxresultsize", "2g")) \ .appname("test") \ .getorcreate()
2) approach without using spark-submit can write python code generate spark-session object , use this
my doubt if submit job using spark-submit , creating spark session object mentioned above ending creating 2 spark session ?
it helpful if can explain me added advantage of using spark-submit on step 2 method. , need create spark-session object if invoke job using spark-submit command line
when submit pyspark job using spark-submit need create spark session object?
yes, not needed in case of shells.
my doubt if submit job using spark-submit , creating spark session object mentioned above am ending creating 2 spark session ?
tl,dr; no
if check code have written
spark = sparksession \ .builder \ .enablehivesupport() \ .config(conf=sparkconf().set("spark.driver.maxresultsize", "2g")) \ .appname("test") \ .getorcreate()
observe getorcreate()
, take care of @ time 1 sparksession object (spark
) exists.
i recommend create context/session in local , makes code pure(as not depending on other our sources object).
Comments
Post a Comment