Skip to content

StabilityPMPNN

ProteinMPNN-based stability predictor from the ProteinGuide paper. Trained on the Rocklin Megascale stability dataset to predict thermodynamic stability (ΔΔG) from structure + sequence.

  • Encode/decode split: encode_structure() runs once per structure (expensive), decode() runs per sequence sample (cheap). This maps naturally to ProbabilityModel's preprocess_observations / forward pattern.
  • Tokenizer: MPNNTokenizer with 21 tokens (20 standard AAs + UNK). When used with TAG, include_mask_token=True adds <mask> at idx 21.

Cross-tokenizer behavior

The stability predictor overrides token_ohe_basis() so that the <mask> token maps to an all-zero OHE row — preserving original PMPNN masking semantics while making mask behavior explicit in the interface. This is a key integration point with TAG's GuidanceProjection.