Title: | Explore Socrata Data with Ease |
Version: | 0.1.0 |
Description: | Provides an interface to search, read, query, and retrieve metadata for datasets hosted on 'Socrata' open data portals. Supports all 'Socrata' data types, including spatial data returned as 'sf' objects. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Config/rextendr/version: | 0.4.0.9000 |
SystemRequirements: | Cargo (Rust's package manager), rustc >= 1.65.0 |
Depends: | R (≥ 4.2) |
Imports: | cli, httr2, rlang (≥ 1.1.0), sf, tibble |
Suggests: | glue, httptest2, rmarkdown, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
URL: | https://ryanzomorrodi.github.io/socratadata/, https://github.com/ryanzomorrodi/socratadata |
BugReports: | https://github.com/ryanzomorrodi/socratadata/issues |
NeedsCompilation: | yes |
Packaged: | 2025-07-22 21:07:53 UTC; user |
Author: | Ryan Zomorrodi [aut, cre] |
Maintainer: | Ryan Zomorrodi <rzomor2@uic.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-29 10:00:02 UTC |
Discover datasets and public data assets using the Socrata Discovery API
Description
Provides access to the Socrata Discovery API, allowing you to search tens of thousands of government datasets and assets published on the Socrata platform. Governments at all levels publish data on topics including crime, permits, finance, healthcare, research, and performance.
Usage
soc_discover(
attribution = NULL,
categories = NULL,
domain_category = NULL,
domains = NULL,
ids = NULL,
names = NULL,
only = "dataset",
provenance = NULL,
query = NULL,
tags = NULL,
domain_tags = NULL,
location = "us",
limit = 10000
)
Arguments
attribution |
string; Filter by the attribution or publisher |
categories |
character vector; Filter by categories. |
domain_category |
string; Filter by domain category (requires a specified domain). |
domains |
character vector; Filter to domains. |
ids |
character vector; Filter by an asset IDs. |
names |
character vector; Filter by asset names. |
only |
character vector; Filter to specific asset types. Must be one or more of: |
provenance |
string; Filter by provenance: |
query |
character string; Filter using a a token matching one from an asset's name, description, category, tags, column names, column fieldnames, column descriptions or attribution. |
tags |
character vector; Filter by tags associated with the assets. |
domain_tags |
string; Filter by domain tags associated with the assets (requires a specified domain). |
location |
string; Regional API domain: |
limit |
whole number; Maximum number of results (cannot exceed 10,000). |
Value
A tibble containing metadata for each discovered asset. Columns include:
- id
Asset identifier (four-by-four ID).
- name
Asset name.
- attribution
Attribution or publisher of the asset.
- owner_name
Display name of the asset owner.
- provenance
Provenance of asset (official or community).
- description
Textual description of the asset.
- created
Date asset was created.
- data_last_updated
Date asset data was last updated
- metadata_last_updated
Date asset metadata was last updated
- categories
Category labels assigned to the asset.
- tags
Tags associated with the asset.
- domain_category
Category label assigned by the domain.
- domain_tags
Tags applied by the domain.
- domain_metadata
Metadata associated with the asset assigned by the domain.
- column_names
Names of asset columns.
- column_labels
Labels of asset columns.
- column_datatypes
Datatypes of asset columns.
- column_descriptions
Description of asset columns.
- permalink
Permanent URL where the asset can be accessed.
- link
Direct asset link.
- license
License associated with the asset.
See Also
https://dev.socrata.com/docs/other/discovery
Examples
# Search for crime-related datasets in the Public Safety category
results <- soc_discover(
query = "crime",
categories = "Public Safety",
only = "dataset"
)
List Available Datasets on a Socrata Portal
Description
Retrieves a catalog of available datasets from a Socrata open data portal.
Usage
soc_list(url)
Arguments
url |
A character string specifying the base URL of the Socrata portal (e.g., |
Value
A tibble with one row per dataset and the following columns:
- id
Dataset identifier (four-by-four ID).
- name
Title of the dataset.
- categories
Categories associated with the dataset.
- keywords
Keywords describing the dataset.
- last_updated
The date of the last dataset modification.
- landing_page
The landing page url of the dataset.
- description
Brief description of the dataset's content.
Extract Socrata Dataset Metadata
Description
Retrieves metadata attributes from a tibble returned by soc_read()
or using the dataset url, including
dataset-level information and column-level descriptions.
Usage
soc_metadata(dataset)
Arguments
dataset |
A tibble returned by |
Details
This function pulls out descriptive metadata such as the dataset's ID, title, attribution, category, creation and update timestamps, description, any domain-specific fields, and field descriptions defined by the data provider.
Value
An object of class soc_meta
, which includes:
- id
Asset identifier (four-by-four ID).
- name
Asset name.
- attribution
Attribution or publisher of the asset.
- owner_name
Display name of the asset owner.
- provenance
Provenance of asset (official or community).
- description
Textual description of the asset.
- created
Date asset was created.
- data_last_updated
Date asset data was last updated
- metadata_last_updated
Date asset metadata was last updated
- domain_category
Category label assigned by the domain.
- domain_tags
Tags applied by the domain.
- domain_metadata
Metadata associated with the asset assigned by the domain.
- columns
A dataframe with the following columns:
- column_name
Names of asset columns.
- column_label
Labels of asset columns.
- column_datatype
Datatypes of asset columns.
- column_description
Description of asset columns.
- permalink
Permanent URL where the asset can be accessed.
- link
Direct asset link.
- license
License associated with the asset.
Examples
url <- "https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/"
data <- soc_read(url, soc_query(limit = 1000L))
metadata <- soc_metadata(data)
print(metadata)
metadata <- soc_metadata(url)
print(metadata)
Build a Socrata Query Object
Description
Constructs a structured representation of a Socrata Query Language (SOQL) query that can be used with Socrata API endpoints. This function does not execute the query; it creates an object that can be passed to request functions or printed for inspection.
Usage
soc_query(
select = NULL,
where = NULL,
group_by = NULL,
having = NULL,
order_by = NULL,
limit = NULL
)
Arguments
select |
string; Columns to retrieve. |
where |
string; Filter conditions. |
group_by |
string; Fields to group by. |
having |
string; Conditions to apply to grouped records. |
order_by |
string; Sort order. |
limit |
whole number; The maximum number of records to return. |
Value
An object of class soc_query
, which prints in a readable format and can be used to build query URLs.
See Also
Use this with a function that executes Socrata requests, e.g., soc_read(url, query = soc_query(...))
Examples
query <- soc_query(
select = "region, avg(magnitude) as avg_magnitude, count(*) as count",
group_by = "region",
having = "count >= 5",
order_by = "avg_magnitude DESC"
)
print(query)
earthquakes_by_region <- soc_read(
"https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/",
query = query
)
Read a Socrata Dataset into R
Description
Downloads and parses a dataset from a Socrata open data portal URL, returning it as a tibble or sf
object.
Metadata is also returned as attributes on the returned object.
Usage
soc_read(url, query = soc_query(), alias = "label", page_size = 10000)
Arguments
url |
string; URL of the Socrata dataset. |
query |
|
alias |
string; Use of field alias values. There are three options:
|
page_size |
whole number; Maximum number of rows returned per request. |
Value
A tibble with additional attributes containing dataset metadata.
If the dataset contains a single non-nested geospatial field, it will be returned as an sf
object.
The returned object has the following attributes:
- id
Asset identifier (four-by-four ID).
- name
Asset name.
- attribution
Attribution or publisher of the asset.
- owner_name
Display name of the asset owner.
- provenance
Provenance of asset (official or community).
- description
Textual description of the asset.
- created
Date asset was created.
- data_last_updated
Date asset data was last updated
- metadata_last_updated
Date asset metadata was last updated
- domain_category
Category label assigned by the domain.
- domain_tags
Tags applied by the domain.
- domain_metadata
Metadata associated with the asset assigned by the domain.
- columns
A dataframe with the following columns:
- column_name
Names of asset columns.
- column_label
Labels of asset columns.
- column_datatype
Datatypes of asset columns.
- column_description
Description of asset columns.
- permalink
Permanent URL where the asset can be accessed.
- link
Direct asset link.
- license
License associated with the asset.
Examples
soc_read(
"https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/"
)
soc_read(
"https://soda.demo.socrata.com/dataset/USGS-Earthquakes-2012-11-08/3wfw-mdbc/",
soc_query(
select = "region, avg(magnitude) as avg_magnitude, count(*) as count",
group_by = "region",
having = "count >= 5",
order_by = "avg_magnitude DESC"
)
)