python - Multiple issues with axes while implementing a Seq2Seq with attention in CNTK -

January 15, 2010

i'm trying implement seq2seq model attention in cntk, similar cntk tutorial 204. however, several small differences lead various issues , error messages, don't understand. there many questions here, interconnected , stem single thing don't understand.

note (in case it's important). input data comes minibatchsourcefromdata, created numpy arrays fit in ram, don't store in ctf.

ins = c.sequence.input_variable(input_dim, name="in", sequence_axis=inaxis) y = c.sequence.input_variable(label_dim, name="y", sequence_axis=outaxis)

thus, shapes [#, *](input_dim) , [#, *](label_dim).

question 1: when run cntk 204 tutorial , dump graph .dot file using cntk.logging.plot, see input shapes [#](-2,). how possible?

where did sequence axis (*) disappear?
how can dimension negative?

question 2: in same tutorial, have attention_axis = -3. don't understand this. in model there 2 dynamic axis , 1 static, "third last" axis #, batch axis. attention shouldn't computed on batch axis.
hoped looking @ actual axes in tutorial code me understand this, [#](-2,) issue above made more confusing.

setting attention_axis -2 gives following error:

runtimeerror: times: left operand 'placeholder('stab_result', [#, outaxis], [128])'               rank (1) must >= #axes (2) being reduced over.

during creation of training-time model:

def train_model(m):     @c.function     def model(ins: inputsequence[tensor[input_dim]],                                 labels: outputsequence[tensor[label_dim]]):         past_labels = delay(initial_state=c.constant(seq_start_encoding))(labels)         return m(ins, past_labels)  #<<<<<<<<<<<<<< here     return model

where stab_result stabilizer right before final dense layer in decoder. can see in dot-file there spurious trailing dimensions of size 1 appear in middle of attentionmodel implementation.

setting attention_axis -1 gives following error:

runtimeerror: binary elementwise operation elementtimes: left operand 'output('block346442_output_0', [#, outaxis], [64])'               shape '[64]' not compatible right operand                'output('attention_weights', [#, outaxis], [200])' shape '[200]'.

where 64 attention_dim , 200 attention_span. understand, elementwise * inside attention model shouldn't conflating these 2 together, therefore -1 not right axis here.

question 3: understanding above correct? should right axis , why causing 1 of 2 exceptions above?

thanks explanations!

first, news: couple of things have been fixed in attentionmodel in latest master (will available cntk 2.2 in few days):

you don't need specify attention_span or attention_axis. if don't specify them , leave them @ default values, attention computed on whole sequence. in fact these arguments have been deprecated.
if above 204 notebook runs 2x faster, 204 notebook not use these arguments anymore
a bug has been fixed in attentionmodel , faithfully implements bahdanau et. al. paper.

regarding questions:

the dimension not negative. use negative numbers in various places mean things: -1 dimension inferred once based on first minibatch, -2 think shape of placeholder, , -3 dimension inferred each minibatch (such when feed variable sized images convolutions). think if print graph after first minibatch, should see shapes concrete.

attention_axis implementation detail should have been hidden. attention_axis=-3 create shape of (1, 1, 200), attention_axis=-4 create shape of (1, 1, 1, 200) , on. in general more -3 not guaranteed work , less -3 adds more 1s without clear benefit. news of course can ignore argument in latest master.

tl;dr: if in master (or starting cntk 2.2 in few days) replace attentionmodel(attention_dim, attention_span=200, attention_axis=-3) attentionmodel(attention_dim). faster , not contain confusing arguments. starting cntk 2.2 original api deprecated.

Search This Blog

Single

python - Multiple issues with axes while implementing a Seq2Seq with attention in CNTK -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -