Skip to content

Workflows

ProteinGen organizes its guides into workflows and modules.

Workflows are end-to-end recipes at the level of a paper's method — they compose multiple modules into a complete pipeline. Modules are the reusable building blocks, each implementing a specific mathematical or engineering idea. Modules are organized by the four library design areas: Data, Models, Sampling, and Evaluation.

High-level design strategies that combine modules into complete pipelines.

Workflow Description Key Modules
ProteinGuide Guided generation using Bayes' rule to combine a generative model with a property predictor Fine-tuning, Noisy Predictor Training, Guidance (TAG/DEG), Likelihood Curves
Continued Pretraining Specialize a pretrained model to a protein family using sequence or structural homologs MSA Acquisition, Fine-tuning, Likelihood Curves

Modules

Reusable building blocks organized by the four library design areas. Workflows reference these modules as sub-steps.

Data

Module Description
MSA → Dataset Turn an MSA into a training-ready dataset (gap stripping, AF3 folding, structure encoding)
Data Splits Split assay data by mutational distance, activity range, or position for train/eval
MSA Acquisition Tools for obtaining sequence and structure-based MSAs

Models

Module Description
Fine-tuning Generative Models LoRA fine-tuning of ESM3/ESMC for sequence-only or structure-conditioned prediction
Training Predictors Train oracle and noisy predictive models from assay data

Sampling

Sampler selection and configuration is covered in the sampling reference. Key decision: use sample (discrete-time ancestral) for DEG guidance, sample_ctmc_linear_interpolation for TAG guidance.

Evaluation

Module Description
Likelihood Curves Evaluate model quality by tracking log-probability under progressive unmasking

Skills

Skills are structured instructions that AI coding agents can load on-demand to perform specific tasks. They live in .agents/skills/ and follow the Agent Skills standard. Feel free to write your own when making your own workflows and sharing them with the community by contributing to ProteinGen.

Setup

To setup skills, make sure to follow the setup instructions, making sure to adapt them to the particular of what your model provider, e.g. Anthropic, OpenAI, Z.ai, etc. expect.

Skill Description
follow-workflow Plan and implement a library design pipeline by walking through workflows step-by-step
add-generative-model Integrate a new generative (transition) model into the library
add-predictive-model Integrate a new predictive model into the library
likelihood-curves Evaluate and plot log-likelihood trajectories for generative models