hadoop - Writing lots of online files on HDFS -


i have hdfs cluster spark streaming handling logs of thousands of tenants.
want write each tenant logs different file (parquet) can access data based on tenant id (and able query log 5 seconds after arrived).
assuming each tenant sends few logs per second, i'll need append small amount of data day long tenant log file.
since hdfs block size 64mb, i'm assuming inefficient have thousands of files appended few bytes every few seconds.
there technique handle such scenario efficiently on hdfs?


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -