--- title: "Getting Started with qryflow" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with qryflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} library(qryflow) ``` # What is `qryflow`? The `qryflow` package lets you tag sections of SQL, called chunks, so that R knows how to execute each part independently. Tagging your SQL lets you: - Keep multiple SQL statements in the same file. - Control how SQL is executed (e.g., `DBI::dbGetQuery()`, `dbExecute()`) - Attach names to results, so they're returned as named R objects - Pass metadata that can be used later in R workflows In short: You can define and run **multi-step SQL workflows** with one function call, and get your results back as a structured R object. This vignette covers: - How to tag SQL chunks - The different types of tags supported - How `qryflow` defines and handles a chunk # Basic usage The main function is `qryflow`, which accepts SQL tagged with special comments and a connection to DBI-compliant database. Note, the SQL can be a character vector, like in the example below, or a filepath to a file that contains SQL. ```{r example} library(qryflow) # Connection to In-Memory DB with table populated from mtcars con <- example_db_connect(mtcars) sql <- " -- @exec: drop_cyl_6 DROP TABLE IF EXISTS cyl_6; -- @exec: prep_cyl_6 CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6; -- @query: df_mtcars SELECT * FROM mtcars; -- @query: df_cyl_6 SELECT * FROM cyl_6; " # Pass tagged SQL to `qryflow` results <- qryflow(sql, con) # Access the results from the chunk named `df_cyl_6` head(results$df_cyl_6) ``` When you run `qryflow()`, here's what happens: 1. Your SQL script is split into chunks using tag lines like `-- @query: df_mtcars`. 2. Each chunk is assigned a type (e.g., `query` or `exec`) 3. Chunks are executed in order, and query results are returned as named objects # Simplify = TRUE By default, `qryflow()` will return a single data.frame (as opposed to a named list of results) if there is only one chunk and the argument `simplify=TRUE`. For example: ```{r} library(qryflow) # Connection to In-Memory DB with table populated from mtcars con <- example_db_connect(mtcars) sql <- " -- @query: df_mtcars SELECT * FROM mtcars; " # Pass tagged SQL to `qryflow` results <- qryflow(sql, con) # Access the results from the chunk named `df_cyl_6` # results$df_cyl_6 head(results) ``` This design choice is to facilitate easy interactive use and is a common use-case. Because `qryflow()` might return a named `list` or a single `data.frame` depending on the input, `qryflow` exports other functions so users can prioritize reliability in return objects. See the help pages for `?qryflow_run()` and `?qryflow_results()`. Or check out the Advanced Usage vignette. # Tagging syntax To tag a specific chunk of SQL, use the following format: `-- @key: value`. For example, to indicate that the following chunk of SQL is a `query`, meaning we expect it to return a `data.frame` by calling `DBI::dbGetQuery`, we precede the statment with a special comment: ```sql -- @query: my_data_frame SELECT * FROM TBL WHERE COLUMN = 'VALUE' ``` Breaking down the tag into it's component parts: 1. The tag begins with two dashes (`--`). This indicates a single line comment in SQL. 2. Next, we use the `@` symbol to indicate the start of a tag, followed by the tag type. Currently, `qryflow` formally supports four tags: `type`, `name`, `query`, and `exec`. We follow the tag type with a colon (`:`). 3. Next comes a value, depending on the type of tag. For the `@type:` tag, this will indicate the "type" of SQL chunk. For the `name`, `query`, and `exec` tags, the value indicates the custom name of the SQL chunk. # Tags and Aliases Each SQL chunk must be tagged so `qryflow` knows how to handle it. Tags use SQL-style comments (`--`) and follow the format: ```sql -- @: ``` ## Common Tags - `@type` Specifies execution type (`-- @type: query`) - `@name` Assigns a name to the chunk’s result (`-- @name: df_users`) - `@query` Executes SQL with `DBI::dbGetQuery()` and names the result (`-- @query: df_users`) - `@exec` Executes SQL with `DBI::dbExecute()` (`-- @exec: drop_table`) `@query` and `@exec` are aliases for setting both `@type` and `@name` in one line. **Aliased form (preferred):** ```sql -- @query: df_mtcars SELECT * FROM mtcars; ``` **Explicit form (equivalent):** ```sql -- @type: query -- @name: df_mtcars SELECT * FROM mtcars; ``` ## Custom Tags You can include additional tags (e.g., `-- @source: users`) to carry metadata into your R workflow. Only one execution-related tag is required per chunk (@query, @exec, or @type). # Defining a Chunk In `qryflow`, a Chunk is a logically grouped section of SQL code, representing a single executable unit within a larger multi-step SQL workflow. The SQL script is split into multiple chunks using specially formatted tag lines (SQL comments beginning with tags like -- @query: or -- @exec:). It scans through the SQL lines and splits the script at these tagged lines, grouping all subsequent SQL lines until the next tag or the end of the script. ## How Splitting Works - Tagged lines act as markers that start a new chunk. - All lines following a tag line belong to that chunk until another tag line starts the next chunk. - If the script starts without any tags, the entire script is treated as one single chunk. - If there’s only one tag somewhere in the script, the script is split into two chunks: - Everything before the tag becomes the first chunk (even if untagged). - The tagged line and everything after it become the second chunk. ## Examples **Example 1 - Script with no tags** ```sql CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6; ``` Result - The entire script is one chunk containing all lines. - Why? Without tags, `qryflow` treats the whole script as a single step. **Example 2 - Script with one tag at the start** ```sql -- @query: get_6cyl SELECT * FROM mtcars WHERE cyl = 6; ``` Result - One chunk starting at the tag, containing the rest of the script. Because the tag is at line 1, the chunk starts there and continues to the end. **Example 3 - Script with one tag in the middle** ```sql SELECT * FROM mtcars WHERE cyl = 6; -- @query: df_mtcars SELECT * FROM mtcars; ``` Result - Two chunks: - Chunk 1: lines before the tag (untagged SQL). - Chunk 2: from the tag line to the end. This preserves any pre-tag SQL as a separate chunk. **Example 4 - Script with multiple tags** ```sql -- @exec: drop_cyl_6 DROP TABLE IF EXISTS cyl_6; -- @exec: prep_cyl_6 CREATE TABLE cyl_6 AS SELECT * FROM mtcars WHERE cyl = 6; -- @query: df_mtcars SELECT * FROM mtcars; -- @query: df_cyl_6 SELECT * FROM cyl_6 ``` - Result - Three chunks, each starting at its respective tag line. - Each chunk is parsed and executed independently in sequence. # Type Identification Before a chunk is parsed or executed, `qryflow` determines its type using the following rules: 1. If a chunk includes an explicit `-- @type:` tag, that value is used as the chunk type. 2. If there is no `@type:` tag, `qryflow` checks for any other tag (`@query:`, `@exec:`, etc.) that matches a registered type. The first match found is used as the type. 3. If no recognized tag is found, the type defaults to the value of `getOption("qryflow.default.type", "query")`. Note that formally, "query" is the default type of `qryflow` if the option is not overridden. ```{r, echo=FALSE, include=FALSE} DBI::dbDisconnect(con) ```