# Command-line Usage
## πΉ 1. Complete Pipeline
**Required:**
- `--mode Complete_Pipeline` β run full analysis from raw FASTQ/FASTA files.
- `-f, --input
` β input directory containing `.fastq.gz` or `.fasta.gz` files.
- `-o, --output ` β output directory.
Use `--nanopore` to enable nanopore mode
```bash
strmie --mode Complete_Pipeline \
-f /path/to/input_dir \
-o /path/to/output_dir \
[other options]
```
## πΉ 2. Index Calculation Only
**Required:**
- `--mode Index_Calculation` β recalculate indices starting from a user-supplied Excel table with allele definitions.
- `-f, --input ` β input directory containing `.fastq.gz` or `.fasta.gz` files.
- `-o, --output ` β output directory.
- `-p, --path ` β Excel file with predefined alleles (`Sample, CAG_Allele_1, CAG_Allele_2`) for recalculation.
```bash
strmie --mode Index_Calculation \
-f /path/to/input_dir \
-o /path/to/output_dir \
-p /path/to/CAG_data_for_recalculating_indices.xlsx
```
---
## Command-line parameters
STRmie-HD provides two main operational modes: Complete_Pipeline and Index_Calculation.
**Main modes**
- `--mode Complete_Pipeline` β run full analysis from raw FASTQ/FASTA files.
- `--mode Index_Calculation` β recalculate indices starting from a user-supplied Excel table with allele definitions.
**General options (required)**
- `-f, --input ` β input directory containing `.fastq.gz` or `.fasta.gz` files.
- `-o, --output ` β output directory.
**Nanopore arguments (used only with `--nanopore`)**
- `--np-max-roi ` β max Region Of Interest (ROI) length (default: 300)
- `--np-max-edits ` β max edits allowed for both flanks (default: 2)
- `--np-max-edits-left ` β override edits for upstream flank (default: 2)
- `--np-max-edits-right ` β override edits for downstream flank (default: 3)
- `--np-seed-len ` β seed prefilter length (0 disables; suggested 5β7) (default: 0)
- `--np-bestmatch`--np-no-bestmatch` β enable/disable `regex.BESTMATCH` (default: enabled)
- `--np-min-read-len ` β minimum read length (default: 50)
- `--np-min-cag-pct ` β discard reads if the fraction of in-frame CAG triplets is below this threshold
(default: 0.70; set `0` to disable)
- `--np-cag-pct-scope {roi,cag_region}` β region for CAG fraction:
- `roi` = entire ROI
- `cag_region` = ROI prefix before the LOI/DOI motif block (default: `cag_region`)
- `--np-allow-caa` β count `CAA` as acceptable alongside `CAG` in the fraction calculation
**Peak detection**
- `--cwt` β enable wavelet-based peak detection (`scipy.signal.find_peaks_cwt`) as an alternative to histogram-based detection.
- `-bc, --cutpoint_based` β call peaks by splitting the histogram at the biological cutpoint (default: 27).
- `-a ` β list of widths (default `[5,6,7,8,9,10]`) used by `find_peaks_cwt` to match expected peak shapes. Because they determine the scale of features considered as peaks β too small misses broad peaks, too large merges or ignores narrow peaks β thus directly impacting sensitivity and specificity in peak calling.
- `-i ` β interval (default `6`) around candidate peaks used for local refinement. Defines how many points on each side are considered when adjusting the peak position β too small may miss the true summit, too large may introduce noise β thus balancing precision and robustness in peak localization.
- `-m ` β minimum CAG repeats to consider (default `7`).
**Indices and thresholds**
- `-c ` β cutpoint (default `27`), separates βhealthyβ vs. βphenotypicβ allele range and used for Allele Ratio.
- `-ti ` β relative peak height threshold for **Instability Index** (rAdvanced).
- `-te ` β relative peak height threshold for **Expansion Index** (Advanced).
**Graphical outputs**
- `--cag_graph` β save histograms of CAG distributions per sample.
- `--ccg_graph` β save histograms of CCG distributions and warning cases.
**Index calculation mode (required parameter only for mode Index_Calculation)**
- `-p, --path ` β Excel file with predefined alleles (`Sample, CAG_Allele_1, CAG_Allele_2`) for recalculation.