Error In Complete.cases
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss Join the Stack Overflow Community Stack Overflow is a community of 4.7 million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up R - complete.cases not all arguments have the same length up vote 0 down vote favorite I have problem with R complete.cases() funciton. I am using Electric power consumption data and I wanted to check if there are any NAs in my subset using complete.cases() function. I expect to get number of complete cases but instead I get an error saying that "not all arguments have the same legnth". I give complete.cases() only one argument that is data frame. All columns in df have the same length. Of course I can check NAs in every column using sum(is.na()) funciton but I am curious why complete.cases() doesn't work. Moreover when I generated data frame with 3 columns filled by random numbers complete.cases() worked. Here is my code so that you can reproduce error: ### READING DATA # reading full file data <- read.table("household_power_consumption.txt", header=1, sep=";", na.strings="?") # changing Date and Time columns to R classes data$Time = strptime(paste(data$Date, data$Time),"%d/%m/%Y %H:%M:%OS") data$Date = as.Date(data$Date, format="%d/%m/%Y") # filtering to needed days data = subset(data, Date == '2007-02-01' | Date == '2007-02-02') # checking if there are any NAs in data dim(data) sum(complete.cases(data)) r na share|improve this question asked Apr 11 '15 at 18:09 Adrian GasiĆski 34 1 Thanks for the reproducible code, but it is better if you provided a small dataset (instead of the zip file) that reproduce the error. –akrun Apr 11 '15 at 18:13 2 Convert your POSIXlt (list) column to POSIXct (vector) and it'll work: data$Time <- as.POSIXct(data$Time); sum(complete.cases(data)). See also here: stackoverflow.com/questions/27957819/… &
Data Date Values R in Action R in Action (2nd ed) significantly expands upon this material. Use promo code ria38 for a 38% discount. Top Menu Home The R Interface Data Input Data Management Basic Statistics Advanced Statistics Basic Graphs Advanced Graphs Blog Missing Data In R, missing values are represented by the symbol NA (not available) . Impossible values (e.g., dividing by zero) are represented by the symbol http://stackoverflow.com/questions/29581304/r-complete-cases-not-all-arguments-have-the-same-length NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data. Testing for Missing Values is.na(x) # returns TRUE of x is missing
y <- c(1,2,3,NA)
is.na(y) # returns a vector (F F F T) Recoding Values to Missing # recode 99 to missing for http://www.statmethods.net/input/missingdata.html variable v1
# select rows where v1 is 99 and recode column v1
mydata$v1[mydata$v1==99] <- NA Excluding Missing Values from Analyses Arithmetic functions on missing values yield missing values. x <- c(1,2,NA,3)
mean(x) # returns NA
mean(x, na.rm=TRUE) # returns 2 The function complete.cases() returns a logical vector indicating which cases are complete. # list rows of data that have missing values
mydata[!complete.cases(mydata),] The function na.omit() returns the object with listwise deletion of missing values. # create new dataset without missing data
newdata <- na.omit(mydata) Advanced Handling of Missing Data Most modeling functions in R offer options for dealing with missing values. You can go beyond pairwise of listwise deletion of missing values through methods such as multiple imputation. Good implementations that can be accessed through R include Amelia II, Mice, and mitools. Copyright © 2014 Robert I. Kabacoff, Ph.D. | SitemapDesigned by WebTemplateOcean.com
data frames. Value A logical vector specifying which observations/rows have no missing values across the entire sequence. Note A current limitation error in of this function is that it uses low level functions to determine lengths and missingness, ignoring the class. This will lead to spurious errors when error in complete.cases some columns have classes with length or is.na methods, for example "POSIXlt", as described in PR#16648. See Also is.na, na.omit, na.fail. Examples x <- airquality[, -1] # x is a regression design matrix y <- airquality[, 1] # y is the corresponding response stopifnot(complete.cases(y) != is.na(y)) ok <- complete.cases(x, y) sum(!ok) # how many are not "ok" ? x <- x[ok,] y <- y[ok] [Package stats version 3.3.0 Index]
Tour Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Cross Validated Questions Tags Users Badges Unanswered Ask Question _ Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top How to do a paired test when some cases have no pair? up vote 1 down vote favorite I have a question regarding paired and unpaired tests. I know the difference between both tests. I am using R for Wilcoxon test. My test is a paired test, however the X and Y do not have the same length and R is giving an error. I do not want/think I should use unpaired test. For example, I have a subject that makes some cookies every hour. I use some special kind of treatment to increase the stamina. Before the treatment he can only work 4 hours (after day he gets tired) and produces some cookies. After the treatment he produces more cookies per hour and work 7 hours without getting tired. X contains the number or cookies for each hour before the treatment and Y contains the number of cookies after the treatment. X contains 4 values and Y contains 7 values. Now if I want to use paired test, R gives error. What should I do? Is there any solution or explanation for such kind of situations? Can I add just NA NA? "This is just an example please do not point out mistake in the example, it is just to give you an example." Thank you. EDIT Here is basic R script that I use someData <- read.csv(file="cookie_data.csv",head=TRUE,sep=",") wilcox.test(someData$X, someData$Y, paired=TRUE) Sample Data: X,Y 2,3 3,2 3,3 2,2 ,3 ,7 ,2 When I use this script, R does not give any error. However, when I print someData$X, it prints 4 values and after that it start writing NA NA NA. I noticed R automatically filled blank values with NA. This script gives me p-value but I do not know if it is correct. hypothesis-testing statistical-significance wilcoxon mann-whitney-u-test share|improve this question edited Nov 21 '12 at 1:14 Glen_b♦ 149k19246511 asked Sep 2 '11 at 22:30 user3900 3213 Is there some significance to the division into hours? If you lumped the times into days instead of hours, it seems like you wouldn'