r - Group columns based on information in an annotation matrix -
i seeking advice on how following task:
i analyzing single-cell rnaseq dataset. have normalized expression data in table ( each column has unique cell id, each row gene).
i have annotation matrix have information of each cell (each row cell id, each column piece of info (such patient id, site,etc.)
for downstream analyses, have different grouping based on info available in annotation matrix. guys have suggestion how might able that????
for example, have
expression_matrix<-matrix(c(1:4), nrow = 4,ncol =4, dimnames = list(c("gene1", "gene2", "gene3", "gene4"),c("cell1","cell2","cell3","cell4"))) annotation_matrix<-matrix(c("1526","1788", "1526","1788","controller","noncontroller","controller","noncontroller","ln","pb","ln","pb"), nrow = 4,ncol =3, dimnames = list(c("cell1","cell2","cell3","cell4"),c("id","status","site")))
i want group based on "site" can combine cell 1 , 3 in 1 group , cell2 , cell4 in group. how use match info annotation matrix expression_matrix?
say, want compare between controller , non-controller need somehow match cell id in normalized_expression table patient group info available in annotation matrix
expression_matrix<-matrix(c(1:4), nrow = 4,ncol =4, dimnames = list(c("gene1", "gene2", "gene3", "gene4"),c("cell1","cell2","cell3","cell4"))) # cell1 cell2 cell3 cell4 # gene1 1 1 1 1 # gene2 2 2 2 2 # gene3 3 3 3 3 # gene4 4 4 4 4 annotation_matrix<-matrix(c("1526","1788", "1526","1788","controller","noncontroller","controller","noncontroller","ln","pb","ln","pb"), nrow = 4,ncol =3, dimnames = list(c("cell1","cell2","cell3","cell4"),c("id","status","site"))) # id status site # cell1 "1526" "controller" "ln" # cell2 "1788" "noncontroller" "pb" # cell3 "1526" "controller" "ln" # cell4 "1788" "noncontroller" "pb"
let's harmonize those
library(dplyr) expression_df <- expression_matrix %>% as.data.frame(stringsasfactor=f) %>% mutate(gene = rownames(.)) %>% gather(cell,value,-gene) # gene cell value # 1 gene1 cell1 1 # 2 gene2 cell1 2 # 3 gene3 cell1 3 # 4 gene4 cell1 4 # 5 gene1 cell2 1 # 6 gene2 cell2 2 # 7 gene3 cell2 3 # 8 gene4 cell2 4 # 9 gene1 cell3 1 # 10 gene2 cell3 2 # 11 gene3 cell3 3 # 12 gene4 cell3 4 # 13 gene1 cell4 1 # 14 gene2 cell4 2 # 15 gene3 cell4 3 # 16 gene4 cell4 4 annotation_df <- annotation_matrix %>% as.data.frame(stringsasfactor=f) %>% mutate(cell = rownames(.)) # id status site cell # 1 1526 controller ln cell1 # 2 1788 noncontroller pb cell2 # 3 1526 controller ln cell3 # 4 1788 noncontroller pb cell4
and can filter, merge, spread wish
example1 <- annotation_df %>% filter(site == "ln") %>% inner_join(expression_df) # id status site cell gene value # 1 1526 controller ln cell1 gene1 1 # 2 1526 controller ln cell1 gene2 2 # 3 1526 controller ln cell1 gene3 3 # 4 1526 controller ln cell1 gene4 4 # 5 1526 controller ln cell3 gene1 1 # 6 1526 controller ln cell3 gene2 2 # 7 1526 controller ln cell3 gene3 3 # 8 1526 controller ln cell3 gene4 4 example2 <- example1 %>% spread(gene,value) # id status site cell gene1 gene2 gene3 gene4 # 1 1526 controller ln cell1 1 2 3 4 # 2 1526 controller ln cell3 1 2 3 4
Comments
Post a Comment