nlp - Usage words n-grams for Keras Tokenizer -


is real use word's n-grams in keras?

e.g., sentences list contains in x_train dataframe "sentences" column. use tokenizer keras in next manner:

tokenizer = tokenizer(lower=true, split=' ') tokenizer.fit_on_texts(x_train.sentences) x_train_tokenized = tokenizer.texts_to_sequences(x_train.sentences) 

and later use padding:

x_train_sequence = sequence.pad_sequences(x_train_tokenized) 

also use simple lstm network:

model = sequential() model.add(embedding(max_features, 128)) model.add(lstm(32, dropout=0.2, recurrent_dropout=0.2,                activation='tanh', return_sequences=true)) model.add(lstm(64, dropout=0.2, recurrent_dropout=0.2, activation='tanh')) model.add(dense(number_classes, activation='sigmoid')) model.compile(loss='categorical_crossentropy', optimizer = 'rmsprop',               metrics=['accuracy']) 

in case, tokenizer execution. in keras docs: https://keras.io/preprocessing/text/ see character processing only, nt apprepriate case.

my main question: can use n-grams tasks of nlp (not necessary sentiment analysis, abstract nlp task).

for clarification: i'd consider not words, combination of words - i'd try task.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -