Command-line Usageο
πΉ 1. Complete Pipelineο
Required:
--mode Complete_Pipelineβ run full analysis from raw FASTQ/FASTA files.-f, --input <dir>β input directory containing.fastq.gzor.fasta.gzfiles.-o, --output <dir>β output directory.
Use --nanopore to enable nanopore mode
strmie --mode Complete_Pipeline \
-f /path/to/input_dir \
-o /path/to/output_dir \
[other options]
πΉ 2. Index Calculation Onlyο
Required:
--mode Index_Calculationβ recalculate indices starting from a user-supplied Excel table with allele definitions.-f, --input <dir>β input directory containing.fastq.gzor.fasta.gzfiles.-o, --output <dir>β output directory.-p, --path <file.xlsx>β Excel file with predefined alleles (Sample, CAG_Allele_1, CAG_Allele_2) for recalculation.
strmie --mode Index_Calculation \
-f /path/to/input_dir \
-o /path/to/output_dir \
-p /path/to/CAG_data_for_recalculating_indices.xlsx
All command-line parametersο
STRmie-HD provides two main operational modes: Complete_Pipeline and Index_Calculation.
Main modes
--mode Complete_Pipelineβ run full analysis from raw FASTQ/FASTA files.--mode Index_Calculationβ recalculate indices starting from a user-supplied Excel table with allele definitions.
General options (required)
-f, --input <dir>β input directory containing.fastq.gzor.fasta.gzfiles.-o, --output <dir>β output directory.
Nanopore arguments (used only with --nanopore)
--np-max-roi <int>β max Region Of Interest (ROI) length (default: 300)--np-max-edits <int>β max edits allowed for both flanks (default: 2)--np-max-edits-left <int>β override edits for upstream flank (default: 2)--np-max-edits-right <int>β override edits for downstream flank (default: 3)--np-seed-len <int>β seed prefilter length (0 disables; suggested 5β7) (default: 0)--np-bestmatchβnp-no-bestmatchβ enable/disableregex.BESTMATCH` (default: enabled)--np-min-read-len <int>β minimum read length (default: 50)--np-min-cag-pct <float>β discard reads if the fraction of in-frame CAG triplets is below this threshold
(default: 0.70; set0to disable)--np-cag-pct-scope {roi,cag_region}β region for CAG fraction: -roi= entire ROI -cag_region= ROI prefix before the LOI/DOI motif block (default:cag_region)--np-allow-caaβ countCAAas acceptable alongsideCAGin the fraction calculation
Peak detection
--cwtβ enable wavelet-based peak detection (scipy.signal.find_peaks_cwt) as an alternative to histogram-based detection.-bc, --cutpoint_basedβ call peaks by splitting the histogram at the biological cutpoint (default: 27).-a <list>β list of widths (default[5,6,7,8,9,10]) used byfind_peaks_cwtto match expected peak shapes. Because they determine the scale of features considered as peaks β too small misses broad peaks, too large merges or ignores narrow peaks β thus directly impacting sensitivity and specificity in peak calling.-i <int>β interval (default6) around candidate peaks used for local refinement. Defines how many points on each side are considered when adjusting the peak position β too small may miss the true summit, too large may introduce noise β thus balancing precision and robustness in peak localization.-m <int>β minimum CAG repeats to consider (default7).
Indices and thresholds
-c <int>β cutpoint (default27), separates βhealthyβ vs. βphenotypicβ allele range and used for Allele Ratio.-ti <float>β relative peak height threshold for Instability Index (rAdvanced).-te <float>β relative peak height threshold for Expansion Index (Advanced).
Graphical outputs
--cag_graphβ save histograms of CAG distributions per sample.--ccg_graphβ save histograms of CCG distributions and warning cases.
Index calculation mode (required parameter only for mode Index_Calculation)
-p, --path <file.xlsx>β Excel file with predefined alleles (Sample, CAG_Allele_1, CAG_Allele_2) for recalculation.