Tutorials
Step-by-step guides to mastering Engram-PEFT for efficient LLM knowledge injection.
Tutorial 1: 5-Minute Quickstart
Learn how to inject Engram conditional memory into a small model like TinyLlama and train it on a toy dataset.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from engram_peft import EngramConfig, get_engram_model, EngramDataCollator, get_optimizer
# 1. Setup
model_id = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# 2. Configure Engram
config = EngramConfig(
target_layers=[2, 11, 20],
embedding_dim=1024,
tokenizer_name_or_path=model_id
)
# 3. Inject & Freeze
base_model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16)
model = get_engram_model(
base_model,
config,
tokenizer,
train_mode="engram_only",
)
# Quick check on overhead
model.print_trainable_parameters()
# 4. Train
collator = EngramDataCollator(tokenizer=tokenizer, config=config)
# EngramTrainer handles the MixedOptimizer and Step Decay scheduler automatically
from engram_peft import EngramTrainer
trainer = EngramTrainer(
model=model,
args=TrainingArguments(
output_dir="engram_out",
per_device_train_batch_size=4,
learning_rate=4e-4 # Automatically passed to MixedOptimizer
),
data_collator=collator,
train_dataset=my_dataset
)
trainer.train()
# 5. Save ONLY the knowledge pack
model.save_pretrained("medical_knowledge_pack")
Tutorial 2: Injecting Medical Knowledge into Llama-3
Engram is specifically designed to store vast amounts of static knowledge without interfering with the model's original reasoning capabilities.
Scenario: You want to fine-tune Llama-3-8B on a large corpus of medical textbooks (PubMed).
- Initialize with full capacity:
Increase
engram_vocab_size_per_ngramto handle millions of specialized medical terms.config = EngramConfig( engram_vocab_size_per_ngram=[2262400, 2262400], # Large capacity target_layers=[2, 8, 16, 24], # More layers for deep knowledge tokenizer_name_or_path="meta-llama/Meta-Llama-3-8B" ) - Train on Domain Data:
Use the
EngramDataCollatorto ensure high-throughput training. Engram's sparse updates allow you to train on a much larger corpus than traditional LoRA without catastrophic forgetting of the base model's logic.
Tutorial 3: Mixing Engram with LoRA
Engram and LoRA can be used together! LoRA is excellent for task adaptation (e.g., following instructions), while Engram is superior for knowledge storage.
The "Double Adapter" Strategy
- Apply LoRA to the base model's Attention or MLP layers.
- Apply Engram to the Transformer Blocks using
train_mode="preserve_trainable"to keep LoRA weights trainable. - Result: A model that has the reasoning style of LoRA and the factual memory of Engram.
from peft import LoraConfig, get_peft_model
from engram_peft import EngramConfig, get_engram_model
# 1. Apply LoRA first
lora_config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
model = get_peft_model(base_model, lora_config)
# 2. Inject Engram on top
engram_config = EngramConfig(target_layers=[2, 15])
# IMPORTANT: preserve_trainable keeps existing LoRA parameters trainable.
model = get_engram_model(
model,
engram_config,
train_mode="preserve_trainable",
)
# Now both LoRA and Engram parameters are trainable!
model.print_trainable_parameters()
[!IMPORTANT] When stacking adapters, use
train_mode="preserve_trainable"so Engram keeps therequires_grad=Truestatus of existing parameters (like LoRA weights).wrap_peft=Trueis still supported as a backward-compatible alias, buttrain_modeis the recommended API.
Tutorial 4: Transparent Injection & Custom Models
Engram-PEFT uses a multi-tiered strategy to find transformer layers. You can monitor this process via logs or override it for custom models.
1. Enabling Detailed Logs
By default, the library is quiet. To see exactly where and how Engram layers are being injected, enable INFO logging:
import logging
# Only show INFO for engram_peft to avoid noise from other libraries
logging.basicConfig(level=logging.WARNING)
logging.getLogger("engram_peft").setLevel(logging.INFO)
Expected Log Output:
[Engram-PEFT] Starting best-effort architecture discovery...
[Engram-PEFT] Determined layer_container_path='model.layers' (source: Architecture Registry (llama))
[Engram-PEFT] Attaching Engram layers to 32 blocks...
- [Injected] Layer 2 -> LlamaDecoderLayer (device: cuda:0)
- [Injected] Layer 15 -> LlamaDecoderLayer (device: cuda:0)
2. Targeting Custom Architectures
Engram-PEFT includes a built-in registry for common architectures including: - Llama-2/3, Mistral, Mixtral, Qwen2 - DeepSeek V2/V3 - Gemma/Gemma 2, Phi/Phi-3 - BERT, RoBERTa, Longformer - GPT-2, GPT-NeoX - GLM/ChatGLM
If you are using a non-standard model that isn't in our built-in registry, Engram will fall back to a heuristic (finding the largest nn.ModuleList). If this fails, you can specify the path manually:
config = EngramConfig(
layer_container_path="my_model.transformer.h", # Explicit path
target_layers=[0, 5, 10]
)
model = get_engram_model(base_model, config)
Tutorial 5: Full Finetuning with Engram
If you want to train the backbone together with Engram, use train_mode="full_finetune" and configure separate optimizer groups for backbone, Engram dense layers, and Engram sparse embeddings.
from torch.optim import AdamW
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from engram_peft import EngramConfig, EngramTrainer, get_engram_model
model_id = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
tokenizer = AutoTokenizer.from_pretrained(model_id)
base_model = AutoModelForCausalLM.from_pretrained(model_id)
config = EngramConfig(
target_layers=[2, 11],
tokenizer_name_or_path=model_id,
)
model = get_engram_model(
base_model,
config,
tokenizer,
train_mode="full_finetune",
)
optimizer = get_optimizer(
model,
backbone_learning_rate=5e-5,
engram_dense_learning_rate=4e-4,
engram_sparse_learning_rate=2e-3,
backbone_optimizer=AdamW,
)
# Or let EngramTrainer build the layered optimizer for you
trainer = EngramTrainer(
model=model,
args=TrainingArguments(
output_dir="engram_full_ft_out",
per_device_train_batch_size=2,
learning_rate=4e-4,
),
train_dataset=my_dataset,
optimizer_kwargs={
"backbone_learning_rate": 5e-5,
"engram_dense_learning_rate": 4e-4,
"engram_sparse_learning_rate": 2e-3,
"backbone_optimizer": AdamW,
},
)
# Save both parts after training
model.save_pretrained("engram_adapter_only")
model.base_model.save_pretrained("engram_full_model")
[!IMPORTANT] In
train_mode="full_finetune",model.save_pretrained(...)still saves only Engram weights and config. Savemodel.base_modelto a separate directory as well if you want a restorable full-finetuned checkpoint.
Tutorial 6: Managing Multiple Knowledge Packs
Engram-PEFT supports a "Named Adapter" system similar to peft. You can load multiple specialized knowledge packs into the same base model and switch between them at runtime.
# Assuming you have an engram model with 'default' knowledge
engram_model.print_trainable_parameters()
# 1. Add a second adapter for a different domain
legal_config = EngramConfig(target_layers=[2, 11, 20], embedding_dim=1024)
engram_model.add_adapter("legal", legal_config)
# 2. Switch to the new adapter for training
engram_model.set_adapter("legal")
# ... run training for legal knowledge ...
# 3. Switch back to medical knowledge
engram_model.set_adapter("default")
Tutorial 7: Flexible Weight Migration
Engram-PEFT allows you to reuse pre-trained knowledge even if your target model has different layers, bucket capacities, or even a different tokenizer seed.
Case A: Structural Alignment (Different Layers/Buckets)
If you have weights trained on layers [0, 1] but your new model uses layers [5, 6]:
# Map layer 0 to 5, and layer 1 to 6
model.load_weights_flexible(
"path/to/engram_weights.pt",
layer_mapping={0: 5, 1: 6},
reuse_structural=False # Recommended: Re-train Gating/Conv for the new layer position
)
Case B: Logic Alignment (Different Seeds/Tokenizer)
If the hashing logic differs (e.g., a different seed was used in EngramConfig), use a reference corpus to "re-discover" the correct indices via best-effort remapping:
# corpus should be a representative sample of your training data (tokens or strings)
model.remap_from_corpus(corpus, "path/to/engram_weights.pt")