python - Multiple issues with axes while implementing a Seq2Seq with attention in CNTK -
i'm trying implement seq2seq model attention in cntk, similar cntk tutorial 204. however, several small differences lead various issues , error messages, don't understand. there many questions here, interconnected , stem single thing don't understand.
note (in case it's important). input data comes minibatchsourcefromdata
, created numpy arrays fit in ram, don't store in ctf.
ins = c.sequence.input_variable(input_dim, name="in", sequence_axis=inaxis) y = c.sequence.input_variable(label_dim, name="y", sequence_axis=outaxis)
thus, shapes [#, *](input_dim)
, [#, *](label_dim)
.
question 1: when run cntk 204 tutorial , dump graph .dot
file using cntk.logging.plot
, see input shapes [#](-2,)
. how possible?
- where did sequence axis (
*
) disappear? - how can dimension negative?
question 2: in same tutorial, have attention_axis = -3
. don't understand this. in model there 2 dynamic axis , 1 static, "third last" axis #
, batch axis. attention shouldn't computed on batch axis.
hoped looking @ actual axes in tutorial code me understand this, [#](-2,)
issue above made more confusing.
setting attention_axis
-2
gives following error:
runtimeerror: times: left operand 'placeholder('stab_result', [#, outaxis], [128])' rank (1) must >= #axes (2) being reduced over.
during creation of training-time model:
def train_model(m): @c.function def model(ins: inputsequence[tensor[input_dim]], labels: outputsequence[tensor[label_dim]]): past_labels = delay(initial_state=c.constant(seq_start_encoding))(labels) return m(ins, past_labels) #<<<<<<<<<<<<<< here return model
where stab_result
stabilizer
right before final dense
layer in decoder. can see in dot-file there spurious trailing dimensions of size 1 appear in middle of attentionmodel
implementation.
setting attention_axis
-1
gives following error:
runtimeerror: binary elementwise operation elementtimes: left operand 'output('block346442_output_0', [#, outaxis], [64])' shape '[64]' not compatible right operand 'output('attention_weights', [#, outaxis], [200])' shape '[200]'.
where 64 attention_dim
, 200 attention_span
. understand, elementwise *
inside attention model shouldn't conflating these 2 together, therefore -1
not right axis here.
question 3: understanding above correct? should right axis , why causing 1 of 2 exceptions above?
thanks explanations!
first, news: couple of things have been fixed in attentionmodel in latest master (will available cntk 2.2 in few days):
- you don't need specify
attention_span
orattention_axis
. if don't specify them , leave them @ default values, attention computed on whole sequence. in fact these arguments have been deprecated. - if above 204 notebook runs 2x faster, 204 notebook not use these arguments anymore
- a bug has been fixed in attentionmodel , faithfully implements bahdanau et. al. paper.
regarding questions:
the dimension not negative. use negative numbers in various places mean things: -1 dimension inferred once based on first minibatch, -2 think shape of placeholder, , -3 dimension inferred each minibatch (such when feed variable sized images convolutions). think if print graph after first minibatch, should see shapes concrete.
attention_axis
implementation detail should have been hidden. attention_axis=-3
create shape of (1, 1, 200), attention_axis=-4
create shape of (1, 1, 1, 200) , on. in general more -3 not guaranteed work , less -3 adds more 1s without clear benefit. news of course can ignore argument in latest master.
tl;dr: if in master (or starting cntk 2.2 in few days) replace attentionmodel(attention_dim, attention_span=200, attention_axis=-3)
attentionmodel(attention_dim)
. faster , not contain confusing arguments. starting cntk 2.2 original api deprecated.
Comments
Post a Comment