python - getting error on frequency distribution,TypeError: unhashable type: 'list' -
doc_clean = [] stopwords_corpus = urducorpusreader('./data', ['stopwords-ur.txt']) stopwords = stopwords_corpus.words() # print(stopwords) infile in (wordlists.fileids()): words = wordlists.words(infile) print(infile) #print(words) finalized_words = remove_urdu_stopwords(stopwords, words) print("\n==== without stopwords ===========\n") print(finalized_words) doc_clean.append(finalized_words) fdist1 = freqdist(doc_clean) print(fdist1)
i trying calculate frequency of each word in vocabulary.say have 10 documents,firstly have performed tokenization , removed stop words these docs,i read frequency distribution in nltk using tried count frequency of each item in these documents.but getting errortypeerror: unhashable type: 'list'
i'm guessing meant build list of words (after cleanup), line appends each list element of doc_clean
:
doc_clean.append(finalized_words)
basically, freqdist
count different elements of list-- if these elements lists, you've got problem. build single list of words documents, replace append()
extend()
:
doc_clean.extend(finalized_words)
Comments
Post a Comment