hadoop - Writing lots of online files on HDFS -
i have hdfs cluster spark streaming handling logs of thousands of tenants.
want write each tenant logs different file (parquet) can access data based on tenant id (and able query log 5 seconds after arrived).
assuming each tenant sends few logs per second, i'll need append small amount of data day long tenant log file.
since hdfs block size 64mb, i'm assuming inefficient have thousands of files appended few bytes every few seconds.
there technique handle such scenario efficiently on hdfs?
Comments
Post a Comment