python - CountVectorizer ignores Upper Case -
what reason why countvectorizer ignores word in upper case?
cv = countvectorizer(stop_words=none,analyzer='word',token_pattern='.*',max_features=none) text = ['this','is','a','test','!'] fcv = cv.fit_transform(list) fcv = [cv.vocabulary_.get(t) t in text] print fcv returns
[5, 3, 2, none, 1]
this caused lowercase set true default in countvectorizer, add lowercase=false.
cv = countvectorizer(stop_words=none, analyzer='word', token_pattern='.*', max_features=none, lowercase=false)
Comments
Post a Comment