# Command-line Usage ## πŸ”Ή 1. Complete Pipeline **Required:** - `--mode Complete_Pipeline` β†’ run full analysis from raw FASTQ/FASTA files. - `-f, --input ` β†’ input directory containing `.fastq.gz` or `.fasta.gz` files. - `-o, --output ` β†’ output directory. Use `--nanopore` to enable nanopore mode ```bash strmie --mode Complete_Pipeline \ -f /path/to/input_dir \ -o /path/to/output_dir \ [other options] ``` ## πŸ”Ή 2. Index Calculation Only **Required:** - `--mode Index_Calculation` β†’ recalculate indices starting from a user-supplied Excel table with allele definitions. - `-f, --input ` β†’ input directory containing `.fastq.gz` or `.fasta.gz` files. - `-o, --output ` β†’ output directory. - `-p, --path ` β†’ Excel file with predefined alleles (`Sample, CAG_Allele_1, CAG_Allele_2`) for recalculation. ```bash strmie --mode Index_Calculation \ -f /path/to/input_dir \ -o /path/to/output_dir \ -p /path/to/CAG_data_for_recalculating_indices.xlsx ``` --- ## Command-line parameters STRmie-HD provides two main operational modes: Complete_Pipeline and Index_Calculation. **Main modes** - `--mode Complete_Pipeline` β†’ run full analysis from raw FASTQ/FASTA files. - `--mode Index_Calculation` β†’ recalculate indices starting from a user-supplied Excel table with allele definitions. **General options (required)** - `-f, --input ` β†’ input directory containing `.fastq.gz` or `.fasta.gz` files. - `-o, --output ` β†’ output directory. **Nanopore arguments (used only with `--nanopore`)** - `--np-max-roi ` β†’ max Region Of Interest (ROI) length (default: 300) - `--np-max-edits ` β†’ max edits allowed for both flanks (default: 2) - `--np-max-edits-left ` β†’ override edits for upstream flank (default: 2) - `--np-max-edits-right ` β†’ override edits for downstream flank (default: 3) - `--np-seed-len ` β†’ seed prefilter length (0 disables; suggested 5–7) (default: 0) - `--np-bestmatch`--np-no-bestmatch` β†’ enable/disable `regex.BESTMATCH` (default: enabled) - `--np-min-read-len ` β†’ minimum read length (default: 50) - `--np-min-cag-pct ` β†’ discard reads if the fraction of in-frame CAG triplets is below this threshold (default: 0.70; set `0` to disable) - `--np-cag-pct-scope {roi,cag_region}` β†’ region for CAG fraction: - `roi` = entire ROI - `cag_region` = ROI prefix before the LOI/DOI motif block (default: `cag_region`) - `--np-allow-caa` β†’ count `CAA` as acceptable alongside `CAG` in the fraction calculation **Peak detection** - `--cwt` β†’ enable wavelet-based peak detection (`scipy.signal.find_peaks_cwt`) as an alternative to histogram-based detection. - `-bc, --cutpoint_based` β†’ call peaks by splitting the histogram at the biological cutpoint (default: 27). - `-a ` β†’ list of widths (default `[5,6,7,8,9,10]`) used by `find_peaks_cwt` to match expected peak shapes. Because they determine the scale of features considered as peaks β€” too small misses broad peaks, too large merges or ignores narrow peaks β€” thus directly impacting sensitivity and specificity in peak calling. - `-i ` β†’ interval (default `6`) around candidate peaks used for local refinement. Defines how many points on each side are considered when adjusting the peak position β€” too small may miss the true summit, too large may introduce noise β€” thus balancing precision and robustness in peak localization. - `-m ` β†’ minimum CAG repeats to consider (default `7`). **Indices and thresholds** - `-c ` β†’ cutpoint (default `27`), separates β€œhealthy” vs. β€œphenotypic” allele range and used for Allele Ratio. - `-ti ` β†’ relative peak height threshold for **Instability Index** (rAdvanced). - `-te ` β†’ relative peak height threshold for **Expansion Index** (Advanced). **Graphical outputs** - `--cag_graph` β†’ save histograms of CAG distributions per sample. - `--ccg_graph` β†’ save histograms of CCG distributions and warning cases. **Index calculation mode (required parameter only for mode Index_Calculation)** - `-p, --path ` β†’ Excel file with predefined alleles (`Sample, CAG_Allele_1, CAG_Allele_2`) for recalculation.