Title: Parsing Glycan Structure Text Representations
Version: 0.5.3
Description: Provides functions to parse glycan structure text representations into 'glyrepr' glycan structures. Currently, it supports StrucGP-style, pGlyco-style, IUPAC-condensed, IUPAC-extended, IUPAC-short, WURCS, Linear Code, and GlycoCT format. It also provides an automatic parser to detect the format and parse the structure string.
License: MIT + file LICENSE
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
Config/testthat/edition: 3
Encoding: UTF-8
RoxygenNote: 7.3.3
URL: https://glycoverse.github.io/glyparse/, https://github.com/glycoverse/glyparse
Imports: checkmate, cli, dplyr, glyrepr (≥ 0.7.0), igraph, purrr, rlang, rstackdeque, stringr
Depends: R (≥ 4.1)
VignetteBuilder: knitr
BugReports: https://github.com/glycoverse/glyparse/issues
NeedsCompilation: no
Packaged: 2025-11-01 07:09:21 UTC; fubin
Author: Bin Fu ORCID iD [aut, cre, cph]
Maintainer: Bin Fu <23110220018@m.fudan.edu.cn>
Repository: CRAN
Date/Publication: 2025-11-04 19:30:02 UTC

Automatic Structure Parsing

Description

Detect the structure string type and use the appropriate parser to parse automatically. Mixed types are supported.

Supported types:

  1. GlycoCT

  2. IUPAC-condensed

  3. IUPAC-extended

  4. IUPAC-short

  5. WURCS

  6. Linear Code

  7. pGlyco

  8. StrucGP

Usage

auto_parse(x)

Arguments

x

A character vector of structure strings.

Value

A glyrepr::glycan_structure() object.

Examples

# Single structure
x <- "Gal(b1-3)GlcNAc(b1-4)Glc(a1-"  # IUPAC-condensed
auto_parse(x)

# Mixed types
x <- c(
  "Gal(b1-3)GlcNAc(b1-4)Glc(a1-",  # IUPAC-condensed
  "Neu5Aca3Gala3(Fuca6)GlcNAcb-"  # IUPAC-short
)
auto_parse(x)


Parse GlycoCT Structures

Description

This function parses GlycoCT strings into a glyrepr::glycan_structure(). GlycoCT is a format used by databases like GlyTouCan and GlyGen.

Usage

parse_glycoct(x)

Arguments

x

A character vector of GlycoCT strings.

Details

GlycoCT format consists of two parts:

For more information about GlycoCT format, see the glycoct.md documentation.

Value

A glyrepr::glycan_structure() object.

Examples

glycoct <- paste0(
  "RES\n",
  "1b:a-dgal-HEX-1:5\n",
  "2s:n-acetyl\n",
  "3b:b-dgal-HEX-1:5\n",
  "LIN\n",
  "1:1d(2+1)2n\n",
  "2:1o(3+1)3d"
)
parse_glycoct(glycoct)


Parse IUPAC-condensed Structures

Description

This function parses IUPAC-condensed strings into a glyrepr::glycan_structure(). For more information about IUPAC-condensed notation, see doi:10.1351/pac199668101919.

Usage

parse_iupac_condensed(x)

Arguments

x

A character vector of IUPAC-condensed strings.

Details

The IUPAC-condensed notation is a compact form of IUPAC-extended notation. It is used by the GlyConnect database. It contains the following information:

An example of IUPAC-condensed string is "Gal(b1-3)GlcNAc(b1-4)Glc(a1-".

The reducing-end monosaccharide can be with or without anomer information. For example, the two strings below are all valid:

In the first case, the anomer is "a2". In the second case, the anomer is "?2".

Value

A glyrepr::glycan_structure() object.

See Also

parse_iupac_short(), parse_iupac_extended()

Examples

iupac <- "Gal(b1-3)GlcNAc(b1-4)Glc(a1-"
parse_iupac_condensed(iupac)


Parse IUPAC-extended Structures

Description

Parse IUPAC-extended-style structure characters into a glyrepr::glycan_structure(). For more information about IUPAC-extended format, see doi:10.1351/pac199668101919.

Usage

parse_iupac_extended(x)

Arguments

x

A character vector of IUPAC-extended strings.

Value

A glyrepr::glycan_structure() object.

See Also

parse_iupac_condensed(), parse_iupac_short()

Examples

iupac <- "\u03b2-D-Galp-(1\u21923)-\u03b1-D-GalpNAc-(1\u2192"
parse_iupac_extended(iupac)


Parse IUPAC-short Structures

Description

Parse IUPAC-short-style structure characters into a glyrepr::glycan_structure(). For more information about IUPAC-short format, see doi:10.1351/pac199668101919.

Usage

parse_iupac_short(x)

Arguments

x

A character vector of IUPAC-short strings.

Details

The IUPAC-short notation is a compact form of IUPAC-condensed notation. It is rarely used in database, but appears a lot in literature for its conciseness. Compared with IUPAC-condensed notation, IUPAC-short notation ignore the anomer positions, assuming they are known for common monosaccharides. For example, "Neu5Aca3Gala-" assumes the anomer of Neu5Ac is C2 (a2-3 linked). Also, the parentheses around linkages are omitted, and parentheses are used to indicate branching, e.g. "Neu5Aca3Gala3(Fuca3)GlcNAcb-".

In the first case, the anomer is "a2". In the second case, the anomer is "?2".

Value

A glyrepr::glycan_structure() object.

See Also

parse_iupac_condensed(), parse_iupac_extended()

Examples

iupac <- "Neu5Aca3Gala3(Fuca6)GlcNAcb-"
parse_iupac_short(iupac)


Parse Linear Code Structures

Description

Parse Linear Code structures into a glyrepr::glycan_structure(). To know more about Linear Code, see this article.

Usage

parse_linear_code(x)

Arguments

x

A character vector of Linear Code strings.

Value

A glyrepr::glycan_structure() object.

Examples

linear_code <- "Ma3(Ma6)Mb4GNb4GNb"
parse_linear_code(linear_code)


Parse pGlyco Structures

Description

Parse pGlyco-style structure characters into a glyrepr::glycan_structure(). See example below for the structure format.

Usage

parse_pglyco_struc(x)

Arguments

x

A character vector of pGlyco-style structure strings.

Value

A glyrepr::glycan_structure() object.

Examples

glycan <- parse_pglyco_struc("(N(F)(N(H(H(N))(H(N(H))))))")
print(glycan, verbose = TRUE)


Parse StrucGP Structures

Description

Parse StrucGP-style structure characters into a glyrepr::glycan_structure(). See example below for the structure format.

Usage

parse_strucgp_struc(x)

Arguments

x

A character vector of StrucGP-style structure strings.

Value

A glyrepr::glycan_structure() object.

Examples

glycan <- parse_strucgp_struc("A2B2C1D1E2F1fedD1E2edcbB5ba")
print(glycan, verbose = TRUE)


Parse WURCS Structures

Description

This function parses WURCS strings into a glyrepr::glycan_structure(). Currently, only WURCS 2.0 is supported. For more information about WURCS, see WURCS.

Usage

parse_wurcs(x)

Arguments

x

A character vector of WURCS strings.

Value

A glyrepr::glycan_structure() object.

Examples

wurcs <- paste0(
  "WURCS=2.0/3,5,4/",
  "[a2122h-1b_1-5_2*NCC/3=O][a1122h-1b_1-5][a1122h-1a_1-5]/",
  "1-1-2-3-3/a4-b1_b4-c1_c3-d1_c6-e1"
)
parse_wurcs(wurcs)