Importing Dates
Dates can be imported from character, numeric, POSIXlt, and POSIXct formats using the as.Date function from the base package.
If your data were exported from Excel, they will possibly be in numeric format. Otherwise, they will most likely be stored in character format.
This outputs the dates in the ISO 8601 international standard format %Y-%m-%d. If you would like to use dates in a different format, read "Changing Date Formats" below.
This outputs the dates in the ISO 8601 international standard format %Y-%m-%d. If you would like to use dates in a different format, read the next step:
If your data were exported from Excel, they will possibly be in numeric format. Otherwise, they will most likely be stored in character format.
Importing Dates from Character Format
If your dates are stored as characters, you simply need to provide as.Date with your vector of dates and the format they are currently stored in. The possible date segment formats are listed in a table below.
For example,
"05/27/84" is in the format %m/%d/%y, while "May 27 1984" is in the format %B %d %Y.
To import those dates, you would simply provide your dates and their format (if not specified, it tries %Y-%m-%d and then %Y/%m/%d):
dates <- c("05/27/84", "07/07/05") betterDates <- as.Date(dates, format = "%m/%d/%y")
> betterDates [1] "1984-05-27" "2005-07-07"
Or:
dates <- c("May 27 1984", "July 7 2005") betterDates <- as.Date(dates, format = "%B %d %Y") > betterDates [1] "1984-05-27" "2005-07-07"
This outputs the dates in the ISO 8601 international standard format %Y-%m-%d. If you would like to use dates in a different format, read "Changing Date Formats" below.
Importing Dates from Numeric Format
If you are importing data from Excel, you may have dates that are in a numeric format. We can still use as.Date to import these, we simply need to know the origin date that Excel starts counting from, and provide that to as.Date.
For Excel on Windows, the origin date is December 30, 1899 for dates after 1900. (Excel's designer thought 1900 was a leap year, but it was not.) For Excel on Mac, the origin date is January 1, 1904.
# from Windows Excel: dates <- c(30829, 38540) betterDates <- as.Date(dates, origin = "1899-12-30") > betterDates [1] "1984-05-27" "2005-07-07" # from Mac Excel: dates <- c(29367, 37078) betterDates <- as.Date(dates, origin = "1904-01-01") > betterDates [1] "1984-05-27" "2005-07-07"
This outputs the dates in the ISO 8601 international standard format %Y-%m-%d. If you would like to use dates in a different format, read the next step:
Changing Date Formats
If you would like to use dates in a format other than the standard %Y-%m-%d, you can do that using the format function from the base package.
For example,
For example,
format(betterDates, "%a %b %d") [1] "Sun May 27" "Thu Jul 07"
Correct Centuries
If you are importing data with only two digits for the years, you will find that it assumes that years 69 to 99 are 1969-1999, while years 00 to 68 are 2000--2068 (subject to change in future versions of R).
Often, this is not what you intend to have happen. This page gives a good explanation of several ways to fix this depending on your preference of centuries. One solution it provides is to assume all dates R is placing in the future are really from the previous century. That solution is as follows:
dates <- c("05/27/84", "07/07/05", "08/17/20") betterDates <- as.Date(dates, "%m/%d/%y") > betterDates [1] "1984-05-27" "2005-07-07" "2020-08-17" correctCentury <- as.Date(ifelse(betterDates > Sys.Date(), format(betterDates, "19%y-%m-%d"), format(betterDates))) > correctCentury [1] "1984-05-27" "2005-07-07" "1920-08-17"
Purpose of Proper Formatting
Having your dates in the proper format allows R to know that they are dates, and as such knows what calculations it should and should not perform on them. For one example, see my post on plotting weekly or monthly totals. Here are a few more examples:> mean(betterDates) [1] "1994-12-16" > max(betterDates) [1] "2005-07-07" > min(betterDates) [1] "1984-05-27"
Date Formats
Conversion specification | Description | Example |
%a | Abbreviated weekday | Sun, Thu |
%A | Full weekday | Sunday, Thursday |
%b or %h | Abbreviated month | May, Jul |
%B | Full month | May, July |
%d | Day of the month 01-31 | 27, 07 |
%j | Day of the year 001-366 | 148, 188 |
%m | Month 01-12 | 05, 07 |
%U | Week 01-53 with Sunday as first day of the week | 22, 27 |
%w | Weekday 0-6 Sunday is 0 | 0, 4 |
%W | Week 00-53 with Monday as first day of the week | 21, 27 |
%x | Date, locale-specific | |
%y | Year without century 00-99 | 84, 05 |
%Y | Year with century on input: 00 to 68 prefixed by 20 69 to 99 prefixed by 19 | 1984, 2005 |
%C | Century | 19, 20 |
%D | Date formatted %m/%d/%y | 05/27/84, 07/07/05 |
%u | Weekday 1-7 Monday is 1 | 7, 4 |
%n | Newline on output or Arbitrary whitespace on input | |
%t | Tab on output or Arbitrary whitespace on input |
References
- help(as.Date)
- help(strptime)
- http://stackoverflow.com/questions/9508747/r-adding-century-to-year
as.Date() discards a significant portion of the precision available in some character strings or in Excel dates.
ReplyDelete> as.Date("2013-08-21T13:00",format="%Y-%m-%dT%H:%M")
[1] "2013-08-21"
> strptime("2013-08-21T13:00",format="%Y-%m-%dT%H:%M")
[1] "2013-08-21 13:00:00"
> as.numeric(strptime("2013-08-21T13:00",format="%Y-%m-%dT%H:%M"))
[1] 1377104400
> as.numeric(as.Date("2013-08-21T13:00",format="%Y-%m-%dT%H:%M"))
[1] 15938
> as.numeric(as.Date("2013-08-21T00:00",format="%Y-%m-%dT%H:%M"))
[1] 15938
This comment has been removed by the author.
DeleteThanks, Mollie! I wish I had seen this about a week ago since I was dealing with dates that initially came out of Excel into CSV with the two-digit year, some of which R interpreted as being in the future. Your code above would have been easier than going back to Excel, forcing the format to "mm/dd/yyyy", saving again as a CSV file and re-importing to R.
ReplyDeleteCheers,
Paul
the lubridate package http://cran.r-project.org/web/packages/lubridate/index.html is very helpful too
ReplyDeleteIf I'm reading data from a .csv with a mac, would I use the same origin as the excel for mac origin (origin = "1904-01-01")?
ReplyDeleteIf your dates look something like "37078" and were exported on a Mac, then yes, I think so. If your dates look nothing like "37078", you don't need to worry about origin and can use the date formats instead.
DeleteThanks. I was confused about the origin argument. I kept getting the "need origin" warning. But I think it's because my date column was provided in a really odd format (200907 for July 2009) and R thought it was in a julian type date.
Delete