<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>PathSTR on Gigabase or gigabyte</title><link>https://wdecoster.github.io/gigabaseorgigabyte/tags/pathstr/</link><description>Recent content in PathSTR on Gigabase or gigabyte</description><generator>Hugo -- 0.163.3</generator><language>en-us</language><copyright>© Wouter De Coster</copyright><lastBuildDate>Thu, 25 Jun 2026 09:00:00 +0200</lastBuildDate><atom:link href="https://wdecoster.github.io/gigabaseorgigabyte/tags/pathstr/index.xml" rel="self" type="application/rss+xml"/><item><title>aSTRonaut, now in Rust</title><link>https://wdecoster.github.io/gigabaseorgigabyte/posts/2026-06-25-astronaut-rust-port/</link><pubDate>Thu, 25 Jun 2026 09:00:00 +0200</pubDate><author>Wouter De Coster</author><guid>https://wdecoster.github.io/gigabaseorgigabyte/posts/2026-06-25-astronaut-rust-port/</guid><description>&amp;lt;no value&amp;gt;</description><content type="text/html" mode="escaped"><![CDATA[<p>For tandem repeat expansion, the length of the repeat is only part of the story and the motif composition can be just as important, as for some loci a change in motif composition is what makes a repeat pathogenic. RFC1 is the classic example: a benign <code>AAAAG</code> allele
and a disease-causing <code>AAGGG</code> expansion can be the same size, with dramatically different consequences.</p>
<p>To make that visible, <a href="https://pathstr.bioinf.be/">pathSTR</a> draws repeats with sequence-motif plots with one row per sample, each nucleotide coloured by the motif it belongs to. It is a simple idea, visually appealing, and remarkably informative. Of course I am not the first one to come up with such a visualization, sometimes also called a <em>waterfall</em> plot. The standalone version of that PathSTR visualization is a Python script called aSTRonaut, which is now also available as a Rust program for speed and simplicity: a single small binary with no dependencies that produces self-contained HTML files.</p>
<p>A note on how it was made: I did not write the Rust by hand. The port was carried out by <a href="https://claude.com/claude-code">Claude</a>, Anthropic&rsquo;s coding agent, working under my supervision. I directed the design, made the calls on what the tool should do and which features to implement, and reviewed the result, while Claude wrote and tested the code. It was a genuinely productive way to work, and I think it is worth being transparent about it.</p>
<h2 id="what-it-looks-like">What it looks like<a href="#what-it-looks-like" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>Here is RFC1 across VCFs from <a href="https://pathstr.bioinf.be/">pathSTR</a>, with the rows clustered by motif composition. The benign <code>AAAAG</code> alleles (light blue) sit together at the bottom, the rare <code>AAGGG</code> expansions (dark green) stand out at the top, and a handful of other motif variants fall in between.</p>
<p><img src="/gigabaseorgigabyte/images/2026-06_astronaut_rfc1.png" alt="RFC1 motif composition across a cohort"></p>
<p>Similarly, here is the <em>HTT</em> CAG repeat behind Huntington&rsquo;s disease: the polyglutamine <code>CAG</code> tract (blue) grows in length from
top to bottom, followed by the <code>CCG</code>-rich tail (pink), with the occasional interruption breaking up the pattern.</p>
<p><img src="/gigabaseorgigabyte/images/2026-06_astronaut_htt.png" alt="HTT CAG repeat"></p>
<h2 id="a-couple-of-new-tricks">A couple of new tricks<a href="#a-couple-of-new-tricks" class="anchor" aria-hidden="true"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"
      stroke-linecap="round" stroke-linejoin="round">
      <path d="M15 7h3a5 5 0 0 1 5 5 5 5 0 0 1-5 5h-3m-6 0H6a5 5 0 0 1-5-5 5 5 0 0 1 5-5h3"></path>
      <line x1="8" y1="12" x2="16" y2="12"></line>
   </svg></a></h2>
<p>The Rust version also comes with a few new features. It can guess the repeat motif length for each locus on its own (<code>-k auto</code>), so you do not have to specify it. And it can <strong>collapse</strong> identical alleles into a haplotype-frequency view, where each unique sequence becomes a single row next to a bar showing how many people carry it.</p>
<p><img src="/gigabaseorgigabyte/images/2026-06_astronaut_rfc1_collapsed.png" alt="Collapsed haplotype-frequency view of RFC1"></p>
<p>The plots are rendered with <a href="https://psy-fer.github.io/kuva/">kuva</a>, a lovely new Rust scientific-plotting library. The example
plots above use data from the <a href="https://pathstr.bioinf.be/">pathSTR</a> database.</p>
<p>aSTRonaut is open source and available on <a href="https://github.com/wdecoster/aSTRonaut">GitHub</a>, with documentation. Feedback and feature requests are welcome.</p>
]]></content></item></channel></rss>