python - FAILED_PRECONDITION: Error: SavedModel directory gs://mybucket1/ is expected contain exactly one of [saved_model.pb, saved_model.pbtxt] -


i'm trying use google cloud platform deploy model support prediction.

i train model (locally) following instruction

    ~/$ gcloud ml-engine local train --module-name trainer.task --package-path trainer 

and works fine (...):

    info:tensorflow:restoring parameters gs://my-bucket1/test2/model.ckpt-45000     info:tensorflow:saving checkpoints 45001 gs://my-bucket1/test2/model.ckpt.     info:tensorflow:loss = 17471.6, step = 45001     [...]     loss: 144278.046875     average_loss: 1453.68     global_step: 50000     loss: 144278.0     info:tensorflow:restoring parameters gs://my-bucket1/test2/model.ckpt-50000     mean square error of test set =  593.1018482 

but, when run following command create version,

    gcloud ml-engine versions create mo1 --model mod1 --origin gs://my-bucket1/test2/ --runtime-version 1.3 

then following error.

    error: (gcloud.ml-engine.versions.create) failed_precondition: field: version.deployment_uri      error: savedmodel directory gs://my-bucket1/test2/ expected contain 1  of: [saved_model.pb, saved_model.pbtxt].- '@type': type.googleapis.com/google.rpc.badrequest fieldviolations:- description: 'savedmodel directory gs://my-bucket1/test2/ expected   contain 1 of: [saved_model.pb, saved_model.pbtxt].' field: version.deployment_uri 

here screenshot of bucket. have saved model 'pbtxt' format

my-bucket-image

finally, add piece of code save model in bucket.

  regressor = tf.estimator.dnnregressor(feature_columns=feature_columns,                                     hidden_units=[40, 30, 20],                                     model_dir="gs://my-bucket1/test2",                                     optimizer='rmsprop'                                     ) 

you'll notice file in screenshot graph.pbtxt whereas saved_model.pb{txt} needed.

note renaming file not sufficient. training process outputs checkpoints periodically in case restarts happen , recovery needed. however, checkpoints (and corresponding graphs) training graph. training graphs tend have things file readers, input queues, dropout layers, etc. not appropriate serving.

instead, tensorflow requires explicitly export separate graph serving. can in 1 of 2 ways:

  1. during training (typically, after training complete)
  2. as separate process after training.

during/after training

for this, i'll refer census sample.

first, you'll need "serving input function", such as

def serving_input_fn():   """build serving inputs."""   inputs = {}   feat in input_columns:     inputs[feat.name] = tf.placeholder(shape=[none], dtype=feat.dtype)    features = {       key: tf.expand_dims(tensor, -1)       key, tensor in inputs.iteritems()   }   return tf.contrib.learn.inputfnops(features, none, inputs) 

the can call:

regressor.export_savedmodel("path/to/model", serving_input_fn) 

or, if you're using learn_runner/experiment, you'll need pass exportstrategy following constructor of experiment:

export_strategies=[saved_model_export_utils.make_export_strategy(               serving_input_fn,               exports_to_keep=1,               default_output_alternative_key=none,           )] 

after training

almost same steps above, in separate python script can run after training on (in case, beneficial because won't have retrain). basic idea construct estimator same model_dir used in training, call export above, like:

def serving_input_fn():   """build serving inputs."""   inputs = {}   feat in input_columns:     inputs[feat.name] = tf.placeholder(shape=[none], dtype=feat.dtype)    features = {       key: tf.expand_dims(tensor, -1)       key, tensor in inputs.iteritems()   }   return tf.contrib.learn.inputfnops(features, none, inputs)  regressor = tf.contrib.learn.dnnregressor(     feature_columns=feature_columns,     hidden_units=[40, 30, 20],     model_dir="gs://my-bucket1/test2",     optimizer='rmsprop' ) regressor.export_savedmodel("my_model", serving_input_fn) 

edit 09/12/2017

one slight change needed training code. using tf.estimator.dnnregressor, introduced in tensorflow 1.3; cloudml engine officially supports tensorflow 1.2, you'll need use tf.contrib.learn.dnnregressor instead. similar, 1 notable difference you'll need use fit method instead of train.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -