h2o.ai Platt Scaling calibration -


i noticed relatively recend add h2o.ai suite, ability perform supplementary platt scaling improve calibration of output probabilities. (see calibrate_model in h2o manual.) nevertheless few guidance avaiable on online docs. in particular wonder whether when platt scaling enabled:

  • how affects models' leaderboard? is, platt scaling calculated after ranking metric or before?
  • how affects computing performance?
  • can calibration_frame same validation_frame or should not (both under computation or theoretical point of view)?

thanks in advance

calibration post-processing step run after model finishes. therefore doesn't affect leaderboard , and has no effect on training metrics either. adds 2 more columns scored frame (with calibrated predictions).

this article provides guidance how construct calibration frame:

  1. split dataset test , train
  2. split train set model training , calibration.

it says: the important step create separate dataset perform calibration with.

i think calibration frame should used calibration, , hence distinct validation frame. conservative answer should separate -- when use validation frame stopping or internal model tuning (e.g. lambda search in h2o glm), validation frame becomes extension of "training data" it's kind of off-limits @ point. try both versions , directly observe effect is, make decision. here's additional guidance article:

"how data use calibration depend on amount of data have available. calibration model fitting small number of parameters (so not need huge volume of data). aim around 10% of training data, @ minimum of @ least 50 examples."


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -