python - CountVectorizer ignores Upper Case -

August 15, 2015

what reason why countvectorizer ignores word in upper case?

cv = countvectorizer(stop_words=none,analyzer='word',token_pattern='.*',max_features=none) text = ['this','is','a','test','!'] fcv = cv.fit_transform(list) fcv = [cv.vocabulary_.get(t) t in text] print fcv

returns

[5, 3, 2, none, 1]

this caused lowercase set true default in countvectorizer, add lowercase=false.

cv = countvectorizer(stop_words=none, analyzer='word', token_pattern='.*',         max_features=none, lowercase=false)

Search This Blog

Single

python - CountVectorizer ignores Upper Case -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -