Every lab has that folder.

The one full of scripts no one documented, results no one can reproduce, and a pipeline that only works on one person's machine. You inherited it from a grad student who graduated three years ago. Half the dependencies are pinned to versions that no longer exist. The other half aren't pinned at all.

You spend your first month not doing science—just getting the code to run.

GG.Flow is the fix.

What This Means for Your Lab

GG.Flow is a scientific pipeline framework—a way to define, execute, and reproduce multi-step computational workflows. It handles dependency resolution, environment isolation, provenance tracking, and result caching so you can focus on the science instead of the plumbing.

Reproducible by Default

Every run is logged with its exact configuration, dependencies, and inputs. Someone joins your lab next year? They run the same pipeline and get the same results.

No More "Works on My Machine"

Environment isolation means your pipeline runs the same way everywhere—your laptop, a cluster, a collaborator's workstation halfway around the world.

Built for Iteration

Change one parameter and re-run. GG.Flow caches everything upstream that didn't change, so you're not waiting hours for steps that already completed.

Human-Readable Pipelines

Your pipeline definition is documentation. New lab members can read it and understand what your analysis does without deciphering a folder of numbered scripts.

Built for Science, Not Just Neuroscience

We built GG.Flow for our own research on emergent simulation—modeling biological, psychological, and environmental systems at scale. But the framework itself doesn't know or care what you're simulating.

If your work involves batch-processed computational pipelines—genomics, climate modeling, materials science, epidemiology, any field where you chain computational steps together and need the results to be reproducible—GG.Flow works for you.

Neuroscience is where we started. Science is where it goes.

Why Open Source

We built GG.Flow for ourselves. Our own simulation pipelines were getting unwieldy—too many moving parts, too many manual steps, too much time lost to configuration instead of research.

Then we realized the same foundation applies to nearly any batch-processed scientific pipeline. And we think it could do real good.

So we opened it. Not as a marketing strategy or a loss leader for paid products. Because the reproducibility crisis in science is real, and every lab that can reproduce its results reliably is a lab doing better science.

GG.Flow is and will remain free and open-source. The MUSE ecosystem that uses it is proprietary. The infrastructure that makes science more reproducible shouldn't be.

GG.Flow implements a directed acyclic graph (DAG) execution model for scientific workflows. Key technical characteristics:

  • Content-addressable caching with cryptographic hashing of inputs, code, and configuration
  • Deterministic execution ordering via topological sort with configurable parallelism
  • Environment isolation through containerization and dependency pinning
  • Full provenance tracking—every output links to its exact inputs, code version, and runtime environment
  • Language-agnostic pipeline steps (Python, R, Julia, shell, or any executable)
  • Pluggable storage backends for result persistence and sharing

GG.Flow is designed for computational reproducibility in scientific contexts. It complements but does not replace domain-specific tools (e.g., workflow managers for HPC job scheduling). Its primary contribution is making the connection between "I ran this analysis" and "here is exactly how to run it again" automatic rather than aspirational.

The framework is released under the MIT license. Contributions are welcome. Integration with institutional HPC environments and cloud platforms is on the roadmap.

The story continues...

Contents

Try GG.Flow

Open-source, free forever, and ready for your next pipeline. Questions? We'd love to hear from you.

Get in Touch