Skip to content

Unconditional Sampling

Architecture Breakdown

Data: None — sampling from a pretrained model.

Models: ESMC (pretrained generative model, no conditioning) → generative_modeling

Sampling: sample (discrete-time ancestral, random unmasking order) → sampling

Evaluation: None in this example.

Generate protein sequences from scratch using ESMC as a masked language model.

Quick Start

from proteingen.models import ESMC
from proteingen import sample

model = ESMC().cuda()
initial_x = ["<mask>" * 256 for _ in range(5)]
sequences = sample(model, initial_x)["sequences"]
uv run python examples/unconditional_sampling.py

How It Works

This starts from fully masked sequences and iteratively unmasks positions using ESMC's learned distribution. At each step, the model predicts a probability distribution over amino acids for every masked position, one position is sampled, and the process repeats until no masks remain.

The decoding order is random by default — positions are unmasked in a uniformly random permutation. This means each run produces different sequences even from the same starting point.

Source: examples/unconditional_sampling.py