nlp - Usage words n-grams for Keras Tokenizer -
is real use word's n-grams in keras?
e.g., sentences list contains in x_train dataframe "sentences" column. use tokenizer keras in next manner:
tokenizer = tokenizer(lower=true, split=' ') tokenizer.fit_on_texts(x_train.sentences) x_train_tokenized = tokenizer.texts_to_sequences(x_train.sentences)
and later use padding:
x_train_sequence = sequence.pad_sequences(x_train_tokenized)
also use simple lstm network:
model = sequential() model.add(embedding(max_features, 128)) model.add(lstm(32, dropout=0.2, recurrent_dropout=0.2, activation='tanh', return_sequences=true)) model.add(lstm(64, dropout=0.2, recurrent_dropout=0.2, activation='tanh')) model.add(dense(number_classes, activation='sigmoid')) model.compile(loss='categorical_crossentropy', optimizer = 'rmsprop', metrics=['accuracy'])
in case, tokenizer execution. in keras docs: https://keras.io/preprocessing/text/ see character processing only, nt apprepriate case.
my main question: can use n-grams tasks of nlp (not necessary sentiment analysis, abstract nlp task).
for clarification: i'd consider not words, combination of words - i'd try task.
Comments
Post a Comment