Unconditional Sampling¶
Architecture Breakdown
Data: None — sampling from a pretrained model.
Models: ESMC (pretrained generative model, no conditioning) → generative_modeling
Sampling: sample (discrete-time ancestral, random unmasking order) → sampling
Evaluation: None in this example.
Generate protein sequences from scratch using ESMC as a masked language model.
Quick Start¶
from proteingen.models import ESMC
from proteingen import sample
model = ESMC().cuda()
initial_x = ["<mask>" * 256 for _ in range(5)]
sequences = sample(model, initial_x)["sequences"]
How It Works¶
This starts from fully masked sequences and iteratively unmasks positions using ESMC's learned distribution. At each step, the model predicts a probability distribution over amino acids for every masked position, one position is sampled, and the process repeats until no masks remain.
The decoding order is random by default — positions are unmasked in a uniformly random permutation. This means each run produces different sequences even from the same starting point.