Error In Data.frame Duplicate Row.names
faq • rss Community Log In Sign Up Add New Post Question: How To Deal With Duplicate Row Names Error In R 2 3.7 years ago by Diana • 680 Germany Diana • 680 wrote: Hi all, I'm facing a very annoying error in R while assigning row names to my data matrix. I have some RNA-seq data that I'm considering clustering in R. I'm using gene names as row names for my expression matrix but it keeps reporting that there are duplicate names. Some un-annotated genes have been assigned with some IDs that start with numbers. I don't understand how to deal with this error? Is there a way to work around it? because I cant change the gene names. EDIT: gene sample1 sample2 sample3 Mar-01 4.19504 3.9006 4.15683 Mar-02 3.0554 3.4261 3.76675 un_A_2 1.1515 1.2455 0.563484 un_A_3 98.2504 120.341 101.753 ENSGALG00000008227 39.6383 12.8651 38.2281 ENSGALG00000008242 5.71557 7.79314 9.40917 ENSGALG00000008277 24.6231 28.3207 24.9288 CNN3 141.708 134.476 144.514 CNNM1 0.840218 0.963683 0.619086 CNNM2 16.0282 12.1301 12.4665 Many thanks. R • 35k views ADD COMMENT • link • Not following Follow via messages Follow via email Do not follow modified 3.7 years ago by Michael Dondrup ♦ 39k • written 3.7 years ago by Diana • 680 5 Gene names - "Mar-01, Mar-02" seems like copy paste from Excel, watch out! http://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/ http://www.biomedcentral.com/1471-2105/5/80 ADD REPLY • link written 3.7 years ago by zx8754 ♦ 3.3k Post the sample dataset ADD REPLY • link written 3.7 years ago by Sukhdeep Singh ♦ 8.3k I've posted a sample of my data ADD REPLY • link written 3.7 years ago by Diana • 680 hi diana did you get solution for ur problem ADD REPLY • link written 2.2 years ago by Tark • 20 Did you read the answer? ADD REPLY • link written 2.2 years ago by Michael Dondrup ♦ 39k 12 3.7 years ago by Michael Dondrup ♦ 39k Bergen, Norway Michael Dondrup ♦ 39k wrote: One way of dealing with this is in R is the function make.names with the option unique=TRUE, see ?make.names. > nams = c("bl-a","bl-a","bl-a", "foo" ) > df = data.frame(matrix (1:4)) > df matrix.1.4. 1 1 2 2 3 3 4 4 > rownames(df) = n
dataframes in R their characteristic distinction. Row names, on the other hand, are rarely used. Usually, row names appear to be the same as row numbers but this is not the case. This quick blog post demonstrates that row names are not the same as row https://www.biostars.org/p/62988/ numbers. This is something most experienced R users are well aware of. > df <- cars[1:5, ] > df speed dist 1 4 2 2 http://www.perfectlyrandom.org/2015/06/16/never-trust-the-row-names-of-a-dataframe-in-R/ 4 10 3 7 4 4 7 22 5 8 16 > rownames(df) [1] "1" "2" "3" "4" "5" > colnames(df) [1] "speed" "dist" A Tricky Situation Let’s say we are working with R in the interactive mode. We have a dataframe that we are inspecting. We want to know various things about the dataframe such as how many rows and columns it has. We can print the dataframe like this: > cars speed dist 1 4 2 2 4 10 - but the row names are unique!!! 0 20 months ago by ccheung • 0 European Union ccheung • 0 wrote: Hi, I'm having problems with the DGEList function in edgeR. Here https://support.bioconductor.org/p/64694/ are the commands that I had input: library(edgeR) raw.data <- read.table(file = "Documents/.../myfile.csv", header=TRUE, sep=",") Data <- raw.data[, 2:45] rownames( Data ) <- raw.data[ , 1 ] colnames(Data) <- paste (c("ML1,ML32,ML4,ML29,etc"), sep="") groups <- c(rep("1",11), rep("2",33)) DGE1 <- DGEList(counts = Data , group = groups ) At this point, it keeps on giving me this error message: Error in `row.names<-.data.frame`(`*tmp*`, value = c("ML1,ML32,ML4,ML29,etc", : duplicate 'row.names' are not allowed error in non-unique values when setting 'row.names': But I know for sure that my row names are unique! Any advice would be appreciated. Thanx. carol dgelist row.names edger ADD COMMENT • link • Not following Follow via messages Follow via email Do not follow modified 20 months ago • written 20 months ago by ccheung • 0 1 20 months ago by James W. MacDonald ♦ 40k United States James W. MacDonald ♦ error in data.frame 40k wrote: The hint here is that the row.names in the error are actually the column names for your data matrix! One of the things that happens when you run DGEList() is that a 'samples' data frame is constructed, and the row.names of that samples data.frame are the column names of your data. If you have duplicate column names (and you do), then this will result in an error. You shouldn't have duplicate column names anyway (you are calling two samples by the same name), so fix that and the error will go away. ADD COMMENT • link written 20 months ago by James W. MacDonald ♦ 40k 0 20 months ago by ccheung • 0 European Union ccheung • 0 wrote: Hi, Thanx John for your answer! However, I double-checked and I am pretty sure that both the column and the row names are unique. Just to be sure, I even put in the command, rownames(df) = make.names(nams, unique=TRUE) but to no avail..... Any other ideas? Thanx. carol ADD COMMENT • link written 20 months ago by ccheung • 0 I am not sure how you can be 'pretty sure' that the column names are unique. Either they are or they are not. Something like any(duplicated(colnames(Data))) wil