r - How to compute row similarity in a data-frame with non uniform similarity between the categories of an attribute? -

June 15, 2015

i compute row similarity in data frame via gower similarity metric following, in general.

library(cluster) mydf <- data.frame(x1 = 1:10, x2 = c(rep("a", 4), rep("b", 3), rep("c", 3)),                     x3 = c(rep("a", 2), rep("b", 2), "c", "d", rep("e", 4))) similarity <- 1 - daisy(mydf, metric = "gower",                         weights = c(1, 1, 1))

above assumed differences between categories in categorical attributes (2nd , 3rd columns) same.

but, if s non uniform dissimilarity matrix between 5 categories of 3rd attribute (a, b, c, d, e):

s <- matrix(c(0.00,    0.09,    0.12,    0.10,    0.12,                 0.09,    0.00,    0.05,    0.13,    0.16,                   0.12,    0.05,    0.00,    0.17,    0.20,               0.10,    0.13,    0.17,    0.00,    0.09,               0.12,    0.16,    0.20,    0.09,    0.00),5)

what best way incorporate piece of information in computing row similarity of data frame via gower similarity?

well, it"s not gower's similarity anymore.

but there nothing wrong defining own distance function

$$d(x,y)=\left(\sum_i d_i(x_i, y_i)^p\right^{1/p}$$

where $d_i$ distance matrix of categoricial values in column i.

i think such measures common in bioinformatics. may difficult measure $d_i$ matrix reliable enough useful.

Search This Blog

Single

r - How to compute row similarity in a data-frame with non uniform similarity between the categories of an attribute? -

Comments

Post a Comment

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -