r - How to compute row similarity in a data-frame with non uniform similarity between the categories of an attribute? -


i compute row similarity in data frame via gower similarity metric following, in general.

library(cluster) mydf <- data.frame(x1 = 1:10, x2 = c(rep("a", 4), rep("b", 3), rep("c", 3)),                     x3 = c(rep("a", 2), rep("b", 2), "c", "d", rep("e", 4))) similarity <- 1 - daisy(mydf, metric = "gower",                         weights = c(1, 1, 1)) 

above assumed differences between categories in categorical attributes (2nd , 3rd columns) same.

but, if s non uniform dissimilarity matrix between 5 categories of 3rd attribute (a, b, c, d, e):

s <- matrix(c(0.00,    0.09,    0.12,    0.10,    0.12,                 0.09,    0.00,    0.05,    0.13,    0.16,                   0.12,    0.05,    0.00,    0.17,    0.20,               0.10,    0.13,    0.17,    0.00,    0.09,               0.12,    0.16,    0.20,    0.09,    0.00),5) 

what best way incorporate piece of information in computing row similarity of data frame via gower similarity?

well, it"s not gower's similarity anymore.

but there nothing wrong defining own distance function

$$d(x,y)=\left(\sum_i d_i(x_i, y_i)^p\right^{1/p}$$

where $d_i$ distance matrix of categoricial values in column i.

i think such measures common in bioinformatics. may difficult measure $d_i$ matrix reliable enough useful.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -