Contents

1 Introduction

A Minimum spanning tree based for Multi-Locus Sequence Typing (MLST) and Core Genome MLST (cgMLST) is useful tool to assess and check the relatedness between the bacterial genomes, such as the strains that are within an outbreak. ChewBBACA pipeline (https://chewbbaca.readthedocs.io/en/latest/) allows to call the allelic profiles based on gene-by-gene schema and it determines the set of loci that are the core genome, in the case of cgMLST analysis. The output of this analysis is a table, the isolates or strain in rows, the loci that constitute the core genome are in columns.

MSTree is built to make a minimum spanning tree from the aformentioned output in two steps: First, a graph object will be made by calculating the distances between the isolates, and in a second step, the generated graph will be used to make a customized minimum spanning tree using one of the two options the user can choose from: plotNetwork function from the NetPathMiner Bioconductor package or using ggraph.

An important parameter when constructing the graph object is the nodes clustering. Based on the Complex Type Distance values provided by https://www.cgmlst.org/ncs, a threshold can be set to connect/custer nodes that have a distance less than or equal to a value. For instance, The complex type distance for the E. coli is 10. When set to 10, isolates with less than or equal to 10 are connected to make a cluster.

2 Overview

MSTree has two main functions:

  1. makeGraphFromChewBBACA: that construct a graph object from the output of chewBBACA pipeline.

  2. PlotMST: that takes the constructed graph from the previous function and it generates a minimum spanning tree using PlotNetwork function from NetPathMiner Bioconductor package or ggraph.


3 Example

cgmlst_output <- system.file("extdata", "cgMLST95.csv", package = "MSTree")

my_graph <- makeGraphFromChewBBACA(cgmlst_output, max_allelic_difference = 9)

mst <- PlotMST(my_graph, show_clustering = TRUE, show_legend=FALSE, 
    MST_edges_color="#b97b29", node_color = "#3b17db", 
    node_label_size = 3, title = "MST")

mst
#> Warning: Removed 46 rows containing non-finite outside the scale range
#> (`stat_edge_link()`).
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's edge_colour values.
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's size values.
#> Warning: Removed 20 rows containing missing values or values outside the scale range
#> (`geom_point()`).
#> Warning: Removed 20 rows containing missing values or values outside the scale range
#> (`geom_text_repel()`).

4 Session information

sessionInfo()
#> R version 4.6.0 RC (2026-04-17 r89917)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MSTree_0.99.5    BiocStyle_2.40.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] viridis_0.6.5       sass_0.4.10         generics_0.1.4     
#>  [4] tidyr_1.3.2         digest_0.6.39       magrittr_2.0.5     
#>  [7] evaluate_1.0.5      grid_4.6.0          RColorBrewer_1.1-3 
#> [10] bookdown_0.46       fastmap_1.2.0       jsonlite_2.0.0     
#> [13] ggrepel_0.9.8       tinytex_0.59        gridExtra_2.3      
#> [16] BiocManager_1.30.27 purrr_1.2.2         viridisLite_0.4.3  
#> [19] scales_1.4.0        tweenr_2.0.3        jquerylib_0.1.4    
#> [22] cli_3.6.6           rlang_1.2.0         graphlayouts_1.2.3 
#> [25] polyclip_1.10-7     tidygraph_1.3.1     withr_3.0.2        
#> [28] cachem_1.1.0        yaml_2.3.12         otel_0.2.0         
#> [31] tools_4.6.0         memoise_2.0.1       dplyr_1.2.1        
#> [34] ggplot2_4.0.3       vctrs_0.7.3         R6_2.6.1           
#> [37] magick_2.9.1        lifecycle_1.0.5     MASS_7.3-65        
#> [40] ggraph_2.2.2        pkgconfig_2.0.3     pillar_1.11.1      
#> [43] bslib_0.10.0        gtable_0.3.6        glue_1.8.1         
#> [46] Rcpp_1.1.1-1.1      ggforce_0.5.0       xfun_0.57          
#> [49] tibble_3.3.1        tidyselect_1.2.1    knitr_1.51         
#> [52] dichromat_2.0-0.1   farver_2.1.2        htmltools_0.5.9    
#> [55] igraph_2.3.1        NetPathMiner_1.48.0 rmarkdown_2.31     
#> [58] compiler_4.6.0      S7_0.2.2