r - How to choose the nrounds using `catboost`? -


if understand correctly catboost, need tune nrounds in xgboost, using cv. see following code in official tutorial in [8]

params_with_od <- list(iterations = 500,                        loss_function = 'logloss',                        train_dir = 'train_dir',                        od_type = 'iter',                        od_wait = 30) model_with_od <- catboost.train(train_pool, test_pool, params_with_od) 

which result in best iteractions = 211.

my question are:

  • is correct that: command use test_pool choose best iteractions instead of using cross-validation?
  • if yes, catboost provide command choose best iteractions cv, or need manually?

catboost doing cross validation determine optimum number of iterations. both train_pool , test_pool datasets include target variable. earlier in tutorial write

train_path = '../r-package/inst/extdata/adult_train.1000' test_path = '../r-package/inst/extdata/adult_test.1000'  column_description_vector = rep('numeric', 15) cat_features <- c(3, 5, 7, 8, 9, 10, 11, 15) (i in cat_features)     column_description_vector[i] <- 'factor'  train <- read.table(train_path, head=f, sep="\t", colclasses=column_description_vector) test <- read.table(test_path, head=f, sep="\t", colclasses=column_description_vector) target <- c(1) train_pool <- catboost.from_data_frame(data=train[,-target], target=train[,target]) test_pool <- catboost.from_data_frame(data=test[,-target], target=test[,target]) 

when execute catboost.train(train_pool, test_pool, params_with_od) train_pool used training , test_pool used determine optimum number of iterations via cross validation.

now right confused, since later on in tutorial again use test_pool , fitted model make prediction (model_best similar model_with_od, uses different overfitting detector inctodec):

prediction_best <- catboost.predict(model_best, test_pool, type = 'probability') 

this might bad practice. might away with inctodec overfitting detector - not familiar mathematics behind - iter type overfitting detector need have separate train,validation , test data sets (and if want on save side, same inctodec overfitting detector). tutorial showing functionality wouldn't pedantic data have used how.

here link little more detail on overfitting detectors: https://tech.yandex.com/catboost/doc/dg/concepts/overfitting-detector-docpage/


Comments

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

minify - Minimizing css files -