python - CountVectorizer ignores Upper Case -
what reason why countvectorizer ignores word in upper case?
cv = countvectorizer(stop_words=none,analyzer='word',token_pattern='.*',max_features=none) text = ['this','is','a','test','!'] fcv = cv.fit_transform(list) fcv = [cv.vocabulary_.get(t) t in text] print fcv
returns
[5, 3, 2, none, 1]
this caused lowercase
set true
default in countvectorizer
, add lowercase=false
.
cv = countvectorizer(stop_words=none, analyzer='word', token_pattern='.*', max_features=none, lowercase=false)
Comments
Post a Comment