| Type: | Package | 
| Title: | Column Profile for Tables and Datasets | 
| Version: | 0.1.0 | 
| Description: | Profiles datasets (collecting statistics and informative summaries about that data) on data frames and 'ODBC' tables: maximum, minimum, mean, standard deviation, nulls, distinct values, data patterns, data/format frequencies. | 
| License: | GPL-3 | file LICENSE | 
| URL: | https://github.com/avitaliano/datrProfile | 
| BugReports: | https://github.com/avitaliano/datrProfile/issues | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Suggests: | testthat | 
| Imports: | odbc, dplyr, RSQLite | 
| RoxygenNote: | 6.1.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2019-07-31 13:07:41 UTC; deinf.arnaldo | 
| Author: | Arnaldo Vitaliano [aut, cre] | 
| Maintainer: | Arnaldo Vitaliano <vitaliano@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2019-08-02 09:20:05 UTC | 
buildQueryColumnFrequency
Description
buildQueryColumnFrequency
Usage
buildQueryColumnFrequency(conn.info, ...)
Arguments
| conn.info | Connection info created with  | 
| ... | Other parameters | 
Value
query column, count(*) from table
buildQueryColumnMetadata
Description
buildQueryColumnMetadata
Usage
buildQueryColumnMetadata(conn.info, ...)
Arguments
| conn.info | Connection info created with  | 
| ... | Other params | 
Value
query columns' metadata
buildQueryColumnStats
Description
buildQueryColumnStats
Usage
buildQueryColumnStats(conn.info, ...)
Arguments
| conn.info | Connection info created with  | 
| ... | Other parameters | 
Value
query count(distinct column) from table
buildQueryColumnStats.sqlite
Description
buildQueryColumnStats.sqlite
Usage
## S3 method for class 'sqlite'
buildQueryColumnStats(conn.info, schema, table, column,
  query.filter, ...)
Arguments
| conn.info | Connection info created with  | 
| schema | Table Schema | 
| table | Table Name | 
| column | Column profiled | 
| query.filter | Filter applied to the profile | 
| ... | Other parameters | 
Value
query count(distinct column) from table
buildQueryCountNull
Description
buildQueryCountNull
Usage
buildQueryCountNull(conn.info, ...)
Arguments
| conn.info | Connection info created with  | 
| ... | Other parameters | 
Value
query select count(*) where collumn is null
buildQueryCountTotal
Description
Count total rows from table.
Usage
buildQueryCountTotal(conn.info, ...)
Arguments
| conn.info | Connection info created with  | 
| ... | Other params | 
Value
query count(*) from table
buildQueryProfileColumnFormatFrequency
Description
buildQueryProfileColumnFormatFrequency
Usage
buildQueryProfileColumnFormatFrequency(conn.info, ...)
Arguments
| conn.info | Connection info created with  | 
| ... | Other parameters | 
Value
queries column format frequency from table
closeConnection
Description
Disconnects from database using odbc::dbDisconnect
Usage
closeConnection(conn)
Arguments
| conn | Connection created at  | 
Value
TRUE if succeeded at closing connection
connectDB
Description
Connects to database using dbConnect
Usage
connectDB(conn.info, ...)
Arguments
| conn.info | Connection info created at  | 
| ... | Other parameters | 
Value
connection to database
connectDB.default
Description
Connects to database using dbConnect
Usage
## Default S3 method:
connectDB(conn.info, ...)
Arguments
| conn.info | Connection info created at  | 
| ... | Other parameters | 
Value
connection to database
connectDB.sqlite
Description
Connects to database using dbConnect
Usage
## S3 method for class 'sqlite'
connectDB(conn.info, ...)
Arguments
| conn.info | Connection info created at  | 
| ... | Other parameters | 
Value
connection to database
getTableColumns
Description
Issues query against the RDBS to retrieve information about each column of the table. Name, type, length, precision, etc.
Usage
getTableColumns(conn.info, schema, table)
Arguments
| conn.info | Connection info created with  | 
| schema | Table schema | 
| table | Table name | 
Value
data frame containing the columns' metadata
Prepares connection to RDBS via ODBC
Description
prepareConnection list connection details needed to connecto
to a RDBS using ODBC
Usage
prepareConnection(db.vendor, odbc.driver = odbc::odbc(),
  db.host = NULL, db.name = NULL, db.encoding = "", dsn = NULL,
  user = NULL, passwd = NULL)
Arguments
| db.vendor | Database vendor (teradata, sqlserver) | 
| odbc.driver | ODBC driver used to connect to database | 
| db.host | Database hostname | 
| db.name | Database name | 
| db.encoding | Database encoding | 
| dsn | Data source name | 
| user | Username to connect to database | 
| passwd | Password to connect to database | 
Examples
conn.info <- prepareConnection(db.vendor = "teradata",
   dsn = "ODBC_MYDB", user = "myuser", passwd = "mypasswd")
Print method
Description
Print method
Usage
## S3 method for class 'profile'
print(x, ...)
Arguments
| x | profile object | 
| ... | other parameters | 
Value
printed profile
profileColumn
Description
profileColumn
Usage
profileColumn(conn.info, schema, table, column, column.datatype,
  query.filter, limit.freq.values = 30, format.show.percentage)
Arguments
| conn.info | Connection info created with  | 
| schema | Table schema | 
| table | Table name | 
| column | Column being profiled | 
| column.datatype | Column datatype | 
| query.filter | Filter applied before profile the column | 
| limit.freq.values | Distinct values shown in frequency data frame | 
| format.show.percentage | Threshold considered when showing formats' percentages | 
Value
columnProfile <- list(column, count.total, count.distinct, perc.distinct, count.null, perc.null, min.value, max.value, column.freq, format.freq = format.freq)
profileColumnFormat
Description
Profiles column based on its format, using masking strategy. X = char, 9 = digit, S = symbol
Usage
profileColumnFormat(conn.info, column, column.datatype, schema, table,
  count.total, query.filter, format.show.percentage)
Arguments
| conn.info | Connection info created with  | 
| column | Column name that will be profiled | 
| column.datatype | Column datatipe | 
| schema | Table schema | 
| table | Table name | 
| count.total | Number of rows to be profiled | 
| query.filter | Filter applied to the table, when profilling | 
| format.show.percentage | Threshold considered when showing formats' percentages | 
Value
Data Frame containing columns' metadata
Profile all columns from ODBC table or dataframe
Description
Profiles tables and dataframes (collecting statistics and informative summaries about that data): max, min, avg, sd, nulls, distinct values, data patterns, data/format frequencies.
Usage
runProfile(conn.info, schema = NULL, table, is.parallel = TRUE,
  count.nodes, query.filter = NA, format.show.percentage = 0.03)
Arguments
| conn.info | Connection info created with  | 
| schema | Table schema | 
| table | Table name | 
| is.parallel | Boolean that indicates if profile will run in parallel. Default TRUE. | 
| count.nodes | Number of nodes used when is.parallel = TRUE | 
| query.filter | Filter applied to the table, when profilling | 
| format.show.percentage | Threshold considered when showing formats percentages | 
Value
profile results for the table/dataframe
Override summary function
Description
Override summary function
Usage
## S3 method for class 'profile'
summary(object, ...)
Arguments
| object | Profile object | 
| ... | other parameters | 
Value
data.frame with summary information