ProGen3¶
Profluent's autoregressive protein language model wrapper (proteingen.models.ProGen3).
Unlike masked models (ESMC, ESM3, DPLM-2), ProGen3 generates left-to-right (N→C terminal):
- Output dim: 134
- Embedding dim: checkpoint-dependent (
EMB_DIMread from weights) - Primary use cases: autoregressive generation, sequence scoring, embedding extraction
Available checkpoints¶
| Checkpoint | Params |
|---|---|
Profluent-Bio/progen3-112m |
112M |
Profluent-Bio/progen3-219m |
219M |
Profluent-Bio/progen3-339m |
339M |
Profluent-Bio/progen3-762m |
762M |
Profluent-Bio/progen3-1b |
1B |
Profluent-Bio/progen3-3b |
3B |
Generation APIs¶
Fixed-length generation via sample()¶
from proteingen.sampling import sample
init_x = ["<mask>" * 100 for _ in range(4)]
traj = sample(model, init_x, in_order="left_to_right")
print(traj["sequences"])
Open-ended generation via generate()¶
result = model.generate(n=4, max_new_tokens=256, temperature=0.8, top_p=0.95)
print(result["sequences"])
Sequence scoring¶
scores = model.score(["ACDEFGHIKLMNPQRSTVWY"])
print(scores["log_likelihood"], scores["perplexity"])
Dependency caveat
The upstream progen3 package may need additional runtime deps (not declared in package metadata), notably megablocks/grouped_gemm (and often Flash Attention) for MoE checkpoints.
See also: Autoregressive Generation (ProGen3)