Title: | Standardize Dates in Different Formats or with Missing Data |
Version: | 1.7.0 |
Description: | There are many different formats dates are commonly represented with: the order of day, month, or year can differ, different separators ("-", "/", or whitespace) can be used, months can be numerical, names, or abbreviations and year given as two digits or four. 'datefixR' takes dates in all these different formats and converts them to R's built-in date class. If 'datefixR' cannot standardize a date, such as because it is too malformed, then the user is told which date cannot be standardized and the corresponding ID for the row. 'datefixR' also allows the imputation of missing days and months with user-controlled behavior. |
License: | GPL (≥ 3) |
URL: | https://docs.ropensci.org/datefixR/, https://github.com/ropensci/datefixR |
BugReports: | https://github.com/ropensci/datefixR/issues |
Depends: | R (≥ 4.1.0) |
Imports: | lifecycle, Rcpp, rlang, stringr |
Suggests: | DT, htmltools, knitr, parsedate, pkgbuild, png, readr, readxl, rmarkdown, shiny, shinytest2, spelling, testthat (≥ 3.0.0), withr |
LinkingTo: | Rcpp |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | true |
Encoding: | UTF-8 |
Language: | en-US |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2024-09-08 14:08:49 UTC; s1961592 |
Author: | Nathan Constantine-Cooke
|
Maintainer: | Nathan Constantine-Cooke <nathan.constantine-cooke@ed.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2024-09-08 14:20:02 UTC |
datefixR: Standardize Dates in Different Formats or with Missing Data
Description
There are many different formats dates are commonly represented
with: the order of day, month, or year can differ, different separators
("-", "/", or whitespace) can be used, months can be numerical, names, or
abbreviations and year given as two digits or four. datefixR
takes dates
in all these different formats and converts them to R's built-in date
class. If datefixR
cannot standardize a date, such as because it is too
malformed, then the user is told which date cannot be standardized and the
corresponding ID for the row. datefixR
also allows the imputation of
missing days and months with user-controlled behavior.
Get started by reading vignette("datefixR")
Author(s)
Maintainer: Nathan Constantine-Cooke nathan.constantine-cooke@ed.ac.uk (ORCID)
Other contributors:
Jonathan Kitt jonathan.kitt@protonmail.com [contributor, translator]
Antonio J. Pérez-Luque ajpelu@gmail.com (ORCID) [contributor, translator]
Daniel Possenriede possenriede+r@gmail.com (ORCID) [contributor, translator]
Michal Lauer michal.lauer.25@gmail.com [contributor, translator]
Kaique dos S. Alves kaiquedsalves@gmail.com (ORCID) [reviewer]
Al-Ahmadgaid B. Asaad alahmadgaid@gmail.com (ORCID) [reviewer]
Anatoly Tsyplenkov atsyplenkov@gmail.com (ORCID) [contributor, translator]
Chitra M. Saraswati chitra.m.saraswati@gmail.com (ORCID) [contributor, translator]
See Also
Useful links:
Report bugs at https://github.com/ropensci/datefixR/issues
Example dataset of dates in different formats
Description
A toy dataset to use with datefixR functions.
Usage
exampledates
Format
A data frame with 5 rows and 3 variables:
- id
Row ID (numeric).
- some.dates
Dates in different formats (character).
- some.more.dates
Additional dates in different formats (character).
Convert improperly formatted date to R's Date class
Description
Converts a single improperly formatted date to R's Date class.
Supports numerous separators including /,- or white space.
Supports all-numeric, abbreviation or long-hand month notation. Where
day of the month has not been supplied, the first day of the month is
imputed. Either DMY or YMD is assumed by default. However, the US system of
MDY is supported via the format
argument.
Usage
fix_date(date, day.impute = 1, month.impute = 7, format = "dmy")
Arguments
date |
Character to be converted to R's date class. |
day.impute |
Integer. Day of the month to be imputed if not available.
defaults to 1. If |
month.impute |
Integer. Month to be be imputed if not available.
Defaults to 7 (July). If |
format |
Character. The format which a date is mostly likely to be given
in. Either |
Value
An object belonging to R's built in Date
class.
See Also
fix_dates
Similar to fix_date()
except is
applicable to columns of a dataframe.
Examples
bad.date <- "02 03 2021"
fixed.date <- fix_date(bad.date)
fixed.date
# ->
fixed.date <- fix_date_char(bad.date)
Shiny application standardizing date data in csv of excel files
Description
A shiny application which allows users to standardize dates
using a graphical user interface (GUI). Most features of datefixR
are supported including imputing missing date data. Data can be provided as
CSV (comma-separated value) or XLSX (Excel) files. Processed datasets can
be downloaded as CSV files. Please note, the dependencies for this app
(DT
, htmltools
, readxl
, and shiny
) are not
installed alongside datefixR
. This allows datefixR
to be
installed on secure systems where these packages may not be allowed. If one
of these dependencies is not installed on the system when this function is
called, then the user will have the option of installing them.
Usage
fix_date_app(theme = "datefixR")
Arguments
theme |
Color theme for shiny app. Either |
Value
A shiny app.
See Also
The shiny
package.
Examples
## Not run:
fix_date_app()
## End(Not run)
Convert improperly formatted date to R's Date class
Description
Converts a character vector (or single character object) from improperly
formatted dates to R's Date class. Supports numerous separators including
/, -, or white space. Supports all-numeric, abbreviation or long-hand month
notation. Where day of the month has not been supplied, the first day of the
month is imputed by default. Either DMY or YMD is assumed by default.
However, the US system of MDY is supported via the format
argument.
Usage
fix_date_char(
dates,
day.impute = 1,
month.impute = 7,
format = "dmy",
excel = FALSE,
roman.numeral = FALSE
)
Arguments
Value
A vector of elements belonging to R's built in Date
class
with the following format yyyy-mm-dd
.
See Also
fix_date_df
which is similar to fix_date_char()
except is applicable to columns of a data frame.
Examples
bad.date <- "02 03 2021"
fixed.date <- fix_date_char(bad.date)
fixed.date
Clean up messy date columns
Description
Tidies a dataframe
object which has date columns
entered via a free-text box (possibly by different users) and are therefore
in a non-standardized format. Supports numerous separators including /,-, or
space. Supports all-numeric, abbreviation, or long-hand month notation. Where
day of the month has not been supplied, the first day of the month is
imputed. Either DMY or YMD is assumed by default. However, the US system of
MDY is supported via the format
argument.
Usage
fix_date_df(
df,
col.names,
day.impute = 1,
month.impute = 7,
id = NULL,
format = "dmy",
excel = FALSE,
roman.numeral = FALSE
)
Arguments
Value
A dataframe
or tibble
object. Dependent on the type of
df
. Selected columns are of type Date
with the following
format yyyy-mm-dd
See Also
fix_date_char
which is similar to fix_date_df()
except can only be applied to character vectors.
Examples
data(exampledates)
fixed.df <- fix_date_df(exampledates, c("some.dates", "some.more.dates"))
fixed.df
Clean up messy date columns
Description
Cleans up a dataframe
object which has date columns
entered via a free-text box (possibly by different users) and are therefore
in a non-standardized format. Supports numerous separators including /,-, or
space. Supports all-numeric, abbreviation, or long-hand month notation. Where
day of the month has not been supplied, the first day of the month is
imputed. Either DMY or YMD is assumed by default. However, the US system of
MDY is supported via the format
argument.
Usage
fix_dates(
df,
col.names,
day.impute = 1,
month.impute = 7,
id = NULL,
format = "dmy"
)
Arguments
df |
A |
col.names |
Character vector of names of columns of messy date data |
day.impute |
Integer. Day of the month to be imputed if not available.
defaults to 1. If |
month.impute |
Integer. Month to be be imputed if not available.
Defaults to 7 (July). If |
id |
Name of column containing row IDs. By default, the first column is assumed. |
format |
Character. The format which a date is mostly likely to be given
in. Either |
Value
A dataframe
or tibble
object. Dependent on the type of
df
. Selected columns are of type Date
See Also
fix_date
Similar to fix_dates()
except can only
be applied to character objects.
Examples
bad.dates <- data.frame(
id = seq(5),
some.dates = c(
"02/05/92",
"01-04-2020",
"1996/05/01",
"2020-05-01",
"02-04-96"
),
some.more.dates = c(
"2015",
"02/05/00",
"05/1990",
"2012-08",
"jan 2020"
)
)
fixed.df <- fix_dates(bad.dates, c("some.dates", "some.more.dates"))
# ->
fixed.df <- fix_date_df(bad.dates, c("some.dates", "some.more.dates"))