python - Multiple issues with axes while implementing a Seq2Seq with attention in CNTK -
i'm trying implement seq2seq model attention in cntk, similar cntk tutorial 204. however, several small differences lead various issues , error messages, don't understand. there many questions here, interconnected , stem single thing don't understand.
note (in case it's important). input data comes minibatchsourcefromdata, created numpy arrays fit in ram, don't store in ctf.
ins = c.sequence.input_variable(input_dim, name="in", sequence_axis=inaxis) y = c.sequence.input_variable(label_dim, name="y", sequence_axis=outaxis) thus, shapes [#, *](input_dim) , [#, *](label_dim).
question 1: when run cntk 204 tutorial , dump graph .dot file using cntk.logging.plot, see input shapes [#](-2,). how possible?
- where did sequence axis (
*) disappear? - how can dimension negative?
question 2: in same tutorial, have attention_axis = -3. don't understand this. in model there 2 dynamic axis , 1 static, "third last" axis #, batch axis. attention shouldn't computed on batch axis.
hoped looking @ actual axes in tutorial code me understand this, [#](-2,) issue above made more confusing.
setting attention_axis -2 gives following error:
runtimeerror: times: left operand 'placeholder('stab_result', [#, outaxis], [128])' rank (1) must >= #axes (2) being reduced over. during creation of training-time model:
def train_model(m): @c.function def model(ins: inputsequence[tensor[input_dim]], labels: outputsequence[tensor[label_dim]]): past_labels = delay(initial_state=c.constant(seq_start_encoding))(labels) return m(ins, past_labels) #<<<<<<<<<<<<<< here return model where stab_result stabilizer right before final dense layer in decoder. can see in dot-file there spurious trailing dimensions of size 1 appear in middle of attentionmodel implementation.
setting attention_axis -1 gives following error:
runtimeerror: binary elementwise operation elementtimes: left operand 'output('block346442_output_0', [#, outaxis], [64])' shape '[64]' not compatible right operand 'output('attention_weights', [#, outaxis], [200])' shape '[200]'. where 64 attention_dim , 200 attention_span. understand, elementwise * inside attention model shouldn't conflating these 2 together, therefore -1 not right axis here.
question 3: understanding above correct? should right axis , why causing 1 of 2 exceptions above?
thanks explanations!
first, news: couple of things have been fixed in attentionmodel in latest master (will available cntk 2.2 in few days):
- you don't need specify
attention_spanorattention_axis. if don't specify them , leave them @ default values, attention computed on whole sequence. in fact these arguments have been deprecated. - if above 204 notebook runs 2x faster, 204 notebook not use these arguments anymore
- a bug has been fixed in attentionmodel , faithfully implements bahdanau et. al. paper.
regarding questions:
the dimension not negative. use negative numbers in various places mean things: -1 dimension inferred once based on first minibatch, -2 think shape of placeholder, , -3 dimension inferred each minibatch (such when feed variable sized images convolutions). think if print graph after first minibatch, should see shapes concrete.
attention_axis implementation detail should have been hidden. attention_axis=-3 create shape of (1, 1, 200), attention_axis=-4 create shape of (1, 1, 1, 200) , on. in general more -3 not guaranteed work , less -3 adds more 1s without clear benefit. news of course can ignore argument in latest master.
tl;dr: if in master (or starting cntk 2.2 in few days) replace attentionmodel(attention_dim, attention_span=200, attention_axis=-3) attentionmodel(attention_dim). faster , not contain confusing arguments. starting cntk 2.2 original api deprecated.
Comments
Post a Comment