Workflows¶
ProteinGen organizes its guides into workflows and modules.
Workflows are end-to-end recipes at the level of a paper's method — they compose multiple modules into a complete pipeline. Modules are the reusable building blocks, each implementing a specific mathematical or engineering idea. Modules are organized by the four library design areas: Data, Models, Sampling, and Evaluation.
High-level design strategies that combine modules into complete pipelines.
| Workflow | Description | Key Modules |
|---|---|---|
| ProteinGuide | Guided generation using Bayes' rule to combine a generative model with a property predictor | Fine-tuning, Noisy Predictor Training, Guidance (TAG/DEG), Likelihood Curves |
| Continued Pretraining | Specialize a pretrained model to a protein family using sequence or structural homologs | MSA Acquisition, Fine-tuning, Likelihood Curves |
Modules¶
Reusable building blocks organized by the four library design areas. Workflows reference these modules as sub-steps.
Data¶
| Module | Description |
|---|---|
| MSA → Dataset | Turn an MSA into a training-ready dataset (gap stripping, AF3 folding, structure encoding) |
| Data Splits | Split assay data by mutational distance, activity range, or position for train/eval |
| MSA Acquisition | Tools for obtaining sequence and structure-based MSAs |
Models¶
| Module | Description |
|---|---|
| Fine-tuning Generative Models | LoRA fine-tuning of ESM3/ESMC for sequence-only or structure-conditioned prediction |
| Training Predictors | Train oracle and noisy predictive models from assay data |
Sampling¶
Sampler selection and configuration is covered in the sampling reference. Key decision: use sample (discrete-time ancestral) for DEG guidance, sample_ctmc_linear_interpolation for TAG guidance.
Evaluation¶
| Module | Description |
|---|---|
| Likelihood Curves | Evaluate model quality by tracking log-probability under progressive unmasking |
Skills¶
Skills are structured instructions that AI coding agents can load on-demand to perform specific tasks. They live in .agents/skills/ and follow the Agent Skills standard. Feel free to write your own when making your own workflows and sharing them with the community by contributing to ProteinGen.
Setup
To setup skills, make sure to follow the setup instructions, making sure to adapt them to the particular of what your model provider, e.g. Anthropic, OpenAI, Z.ai, etc. expect.
| Skill | Description |
|---|---|
follow-workflow |
Plan and implement a library design pipeline by walking through workflows step-by-step |
add-generative-model |
Integrate a new generative (transition) model into the library |
add-predictive-model |
Integrate a new predictive model into the library |
likelihood-curves |
Evaluate and plot log-likelihood trajectories for generative models |