h2o.ai Platt Scaling calibration -
i noticed relatively recend add h2o.ai suite, ability perform supplementary platt scaling improve calibration of output probabilities. (see calibrate_model
in h2o manual.) nevertheless few guidance avaiable on online docs. in particular wonder whether when platt scaling enabled:
- how affects models' leaderboard? is, platt scaling calculated after ranking metric or before?
- how affects computing performance?
- can
calibration_frame
samevalidation_frame
or should not (both under computation or theoretical point of view)?
thanks in advance
calibration post-processing step run after model finishes. therefore doesn't affect leaderboard , and has no effect on training metrics either. adds 2 more columns scored frame (with calibrated predictions).
this article provides guidance how construct calibration frame:
- split dataset test , train
- split train set model training , calibration.
it says: the important step create separate dataset perform calibration with.
i think calibration frame should used calibration, , hence distinct validation frame. conservative answer should separate -- when use validation frame stopping or internal model tuning (e.g. lambda search in h2o glm), validation frame becomes extension of "training data" it's kind of off-limits @ point. try both versions , directly observe effect is, make decision. here's additional guidance article:
"how data use calibration depend on amount of data have available. calibration model fitting small number of parameters (so not need huge volume of data). aim around 10% of training data, @ minimum of @ least 50 examples."
Comments
Post a Comment