Case Studies

This section provides two case studies demonstrating how to use STRmie-HD depending on the sequencing design:

  1. Paired-End (PE) reads – Combine R1/R2 into single continuous reads using an external merging tool, then run STRmie-HD.

  2. Single-End (SE) reads – Unpaired reads (Single-End), such as Illumina or long reads from Nanopore/PacBio. Run STRmie-HD directly on gzipped FASTQ/FASTA files.

  3. Oxford Nanopore long reads – Run STRmie-HD in Nanopore mode (--nanopore).

Both flows converge to the same STRmie-HD pipeline and produce the same interactive HTML report for visual inspection.


Case Study 1 – Paired-End Reads (PE) with PEAR

When working with paired-end data, we recommend merging reads upstream with PEAR to obtain single, high-confidence sequences spanning the CAG repeat region.

Example paired-end test files are provided in the repository under tests/example_file/paired_end_file/.
These can be used to reproduce the workflow immediately after cloning the repository.

Generic PEAR command

pear -f forward_R1.fastq.gz \
     -r reverse_R2.fastq.gz \
     -v 10 \
     -o output_prefix

Parameters

  • -f forward_R1.fastq.gz → path to the forward read (R1).

  • -r reverse_R2.fastq.gz → path to the reverse read (R2).

  • -v MIN_OVERLAPminimum overlap length (in bp) required to confidently merge a pair (default: 10).

  • -o output_prefix → output filename prefix; PEAR will produce files such as output_prefix.assembled.fastq (merged reads).


Alternative merging tool: FLASH (Fast Length Adjustment of Short reads) can also be used for merging paired-end reads before running STRmie-HD.

After merging, use the merged reads as input for STRmie‑HD (see Running the Complete Pipeline below).


Example workflow with test data (Paired-End)

Running PEAR

pear -f tests/example_file/paired_end_file/ID1732-HTT-E10-56-HD17401-A001_S55_L001_R1.fastq.gz \
     -r tests/example_file/paired_end_file/ID1732-HTT-E10-56-HD17401-A001_S55_L001_R2.fastq.gz \
     -v 10 \
     -o tests/example_file/HD17401

The same procedure can be repeated for the other test samples provided in tests/example_file/paired_end_file/ (e.g., HD4501 and HD3903) by replacing the corresponding input filenames and output prefix.

Organize merged reads and prepare output folders

mkdir tests/example_file/assembled_reads

mv tests/example_file/*.assembled.fastq tests/example_file/assembled_reads/

gzip tests/example_file/assembled_reads/*

mkdir tests/example_file/strmie_output

Running the Complete Pipeline

strmie --mode Complete_Pipeline \
       -f tests/example_file/assembled_reads/ \
       -o tests/example_file/strmie_output

Case Study 2 – Single-End Reads (SE)

For single-end datasets, no merging is required. STRmie-HD can ingest FASTQ.gz (preferred, includes quality) or FASTA.gz files directly.

Example single-end test files are provided in the repository under tests/input_file/.
These can be used directly after cloning the repository, without additional preprocessing.

Example workflow with test data (Single-End)

Prepare directories for STRmie-HD output

mkdir tests/example_file/strmie_output_se

Running the Complete Pipeline

strmie --mode Complete_Pipeline \
       -f tests/input_file/ \
       -o tests/example_file/strmie_output_se

Case Study 3 – Oxford Nanopore reads

For noisy long reads, STRmie-HD provides an integrated Nanopore mode.

Example Nanopore test files are provided under: tests/example_file/nanopore_file/.
These can be used directly after cloning the repository, without additional preprocessing.

Example workflow with test data (Nanopore)

Prepare directories output

mkdir tests/example_file/strmie_output_nanopore

Run STRmie-HD in Nanopore mode

strmie --mode Complete_Pipeline \
       -f tests/example_file/nanopore_file/ \
       -o tests/example_file/strmie_output_nanopore \
       --nanopore 

Running the Index Calculation

ℹ️ Guidance: If you intend to run the Index_Calculation mode, see the see the HTML report Step by Step workflow section. There you will find instructions on how to use the interactive HTML report to manually correct allele peaks and export a curated table for recalculating indices.