hadoop - Best way to automatate getting data from Csv files to Datalake -


i need data csv files ( daily extraction différent business databasses ) hdfs move hbase , finaly charging agregation of data datamart (sqlserver ).

i know best way automate process ( using java or hadoops tools )

little no coding required? in no particular order

  • talend open studio
  • streamsets data collector
  • apache nifi

assuming can setup kafka cluster, can try kafka connect

if want program something, spark. otherwise, pick favorite language. schedule job via oozie

if don't need raw hdfs data, can load directly hbase


Comments