Outputs

Generated Output Files

  • report.html → interactive HTML report for manual inspection and allele peak correction.

  • Final_report.xlsx → consolidated Excel report including definitive allele peaks, CCG counts per allele, instability/expansion indices, LOI/DOI flags, and resolved warnings.

  • raw_counts/ → per-sample CSV files with raw read-level counts and annotations (CAG, CCG, LOI/DOI, sequence). These files are used internally to generate histograms in the HTML report, but can also be inspected manually.

  • report_0.xlsxdraft Excel report generated before CCG assignment and warning resolution. It contains preliminary peak calls, indices, and LOI/DOI values (intermediate step before producing the final report).

If --cag_graph or --ccg_graph are enabled, additional subdirectories are created:

  • CAG_graphs/, CCG_alleles_graphs/ → per-sample histograms.

  • warning_case/, forced_graphs/ → outputs for warning samples and their forced resolution.

Warning samples and forced resolution If a sample shows a non-canonical histogram (e.g., unclear or noisy peaks), STRmie-HD flags it as warning.

  • By default, the tool retries a forced peak search (forced_graphs/), applying a broader scan for candidate peaks.

  • If the --cwt option is enabled, STRmie-HD additionally applies wavelet-based peak detection (scipy.signal.find_peaks_cwt) to test multiple alternative strategies and recover allele peaks in ambiguous cases.

All forced or re-analyzed cases are saved in:

  • warning_case/ → plots and data for flagged samples.

  • forced_graphs/ → results from the forced peak search.

HTML report Step by Step workflow

The interactive report.html guides users through manual inspection and, if necessary, allele correction (Index_Calculation).

1) Upload the report and the raw_counts folder Upload the Final_report.xlsx produced by the Complete Pipeline, then select the raw_counts folder containing the .csv files used to build histograms.

Upload the report and select the raw_counts folder

2) View the loaded table (before selecting a sample) After loading both inputs, the main table displays, for each sample, the metrics computed by STRmie‑HD and the detected allele peaks.

Table loaded before selecting a row

3) Select a sample to display the histogram and flags Click on a row to display the corresponding histogram (bar plot). STRmie‑HD automatically highlights LOI/DOI flags if thresholds are exceeded (e.g., LOI CAA > 10).

Histogram after selecting a sample

Generate a table for recalculating indices (Index_Calculation mode)

This mode is designed for cases where automatic allele calling requires manual refinement. While reviewing the interactive HTML report, users may determine that the automatically detected allele peaks are inaccurate. Through the HTML interface, users can: Visually explore CAG and CCG histograms for each sample. Fill in the corrected values for CAG_Allele_1 and CAG_Allele_2 in the table Export a Excel table containing the revised allele definitions.

The Index_Calculation mode uses this corrected table as input to recompute all instability and expansion indices (II, EI).


Example of the HTML interface

In the HTML report, use the section “Generate a table for recalculating indices…” to adjust allele peaks and export the corrected Excel file.

How to proceed

  1. Inspect the histogram and the reported allele peaks.

  2. If the automatic peaks do not match the histogram, use the “Generate a table for recalculating indices…” section:

    • Select the sample in the dropdown.

    • Click Add Row, then enter in the corrected CAG_Allele_1 and CAG_Allele_2 in the table.

    • Click Save as Excel to export CAG_data_for_recalculating_indices.xlsx.

  3. Recompute indices with the corrected alleles:

strmie --mode Index_Calculation -f tests/example_file/paired_end_file/ -o tests/example_file/strmie_output/ -p /path/to/CAG_data_for_recalculating_indices.xlsx 

Parameters

  • -f /path/to/input_dir/ → same input folder used in the Complete Pipeline (if you followed the Case Studies, this corresponds to tests/example_file/paired_end_file/).

  • -o /path/to/output_dir/ → must be the same output folder specified when running the Complete Pipeline (the one provided with -o, if you followed the Case Studies, this corresponds to tests/example_file/strmie_output/).

  • -p /path/to/CAG_data_for_recalculating_indices.xlsxreplace with the path to the Excel file you exported from the HTML report.

Case 1 – Biological considerations and manual correction

Generate a table for recalculating indices

Note: In this case, the peaks are technically detected correctly, but biological considerations require adjusting the values used for index calculation.
Specifically:

  • There is no reference peak for the healthy allele.

  • Mosaicism shows two peaks for the expanded allele and none for the healthy one.

As a general guideline, a reasonable range for the healthy allele (Allele 1) is 10–26 CAG repeats.
In this example, a reference value of 17 CAG repeats is manually chosen for the healthy allele, while for the expanded allele (Allele 2) the most expanded peak is selected for recalculation.


Case 2 – Rare cases where peaks are not detected

Alternative example of recalculating indices

Note: There are very rare situations where the automatic function fails to detect peaks in the histogram.
In these cases, the user must manually indicate the allele peaks based on the peaks that are visible in the histogram.
This manual assignment ensures that the indices (II, EI) can still be computed consistently.


Outcome

  • Index_Calculation recomputes II and EI using the corrected allele peaks, aligning metrics with visual evidence from histograms.