Examples¶
Concrete implementations showing how to compose ProteinGen's modules into working pipelines. Each example includes an Architecture Breakdown at the top that maps its components to the four library design areas (Data, Models, Sampling, Evaluation) and links to the relevant modules and reference docs.
All examples live in the examples/ directory. Run them with uv run python examples/path-to-script.py.
Guided Generation¶
Full pipelines that combine generative models with predictive models for property-optimized sampling.
- Stability-Guided Generation — ESM3 inverse folding + TAG guidance with a noisy stability predictor (all four modules)
- Guided Sampling (TrpB) — Train a fitness probe on ESMC embeddings, then DEG-guided sampling
Sampling¶
Generate protein sequences using pretrained models.
- Unconditional Sampling — generate sequences from scratch with ESMC (Models + Sampling only)
- Autoregressive Generation (ProGen3) — unconditional protein generation with ProGen3's left-to-right sampling
- Structure-Conditioned Sampling — inverse folding with ESM3 backbone conditioning
Fine-tuning¶
Adapt pretrained models to specific protein families.
- Sequence-only MLM (EphB1) — LoRA fine-tune ESM3 on kinase domain homologs (Data + Models)
- Inverse Folding (EphB1) — LoRA fine-tune ESM3 with AF3-predicted structures (Data + Models + Evaluation)
Evaluation¶
Assess model quality and compare generation strategies.
- Likelihood Curves (TrpB) — diagnostic: log p(true token) vs. fraction unmasked (Evaluation only)
- Benchmark: Model Families — compare 6 models across 3 families at 4 masking levels, with AF3 structural validation
Predictive Modeling¶
Build lightweight predictors on top of pretrained representations.
- PCA Embedding Initialization — compress ESMC embeddings via PCA for small predictors