python - How to construct clusters based on pairwise linkage (the same or not) -


i have set of images , asked on mturk whether given 2 images, belong same category or not (there more application-specific nuance here asking whether belong same category or not).

my question how construct cluster assignment such answers, assume possible pairs within set answered. ideally robust noise (we duplicated questions , plan use majority vote).

one example, assuming there 3 images b c d. assuming answer following: similar b c similar d different c b different c different d b different d

the output should 2 clusters (a, b) , (c, d). note not know number of clusters in advance , infer answers.

i found related questions on not same. instance, might based on distance instead of boolean answer (yes or no). might able reduce question form of distance suppose question easier distance setting. related questions here:

clustering given pairwise distances unknown cluster number?

https://stats.stackexchange.com/questions/2717/clustering-with-a-distance-matrix

would more ideal algorithms have python implementation (e.g., sklearn). if not, don't mind implement myself.

thank you.

sounds want use hierarchical clustering.

when do, e.g., average linkage, merges clusters such people consider them "similar".

you need put thought how deal missing information, contradicting information etc. - example use similarity(x,y)=(0.5+#positivevotes)/(1+#positivevotes+#negativevotes) each pair. if pair has not been evaluated, yields 0.5, after 1 positive vote becomes 0.75, after negative vote 0.25, , additional votes give more decided similarity (unless disagree, of course).


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -