Tinker is a remote-control for serious model training: you keep the steering wheel, Thinking Machines supplies the engine room.
You define the data, the loss function, and the loop.
The API sends your exact training work to distributed GPUs and gives results back.
Write a normal Python training loop.
Executes gradient work on big models.
Inspect losses, sample, evaluate, repeat.
Tinker removes distributed training plumbing, not your judgment. That matters because research code often needs weird losses, custom rewards, and tight control.
A Tinker script has a small cast. Learn who does what so you can tell an AI agent where new logic belongs.
import tinker
from tinker import types
service_client = tinker.ServiceClient()
training_client = service_client.create_lora_training_client(
base_model="Qwen/Qwen3-8B",
rank=32,
)Load the SDK and its type helpers.
Create the front door into Tinker.
Start a LoRA training run for one base model.
The `rank` controls adapter size: bigger can learn more, but costs more.
A `Datum` is the envelope that tells Tinker: here are the tokens, here is what to predict, and here is what to ignore.
The prompt is covered up with `weights=0`; the answer is exposed with `weights=1`. The model only gets graded through the exposed holes.
If an AI agent trains on the prompt by mistake, the model learns to copy questions instead of answer them. Loss masks are a common place to inspect first.
datum = types.Datum(
model_input=types.ModelInput.from_ints(tokens=input_tokens),
loss_fn_inputs={
"weights": weights,
"target_tokens": target_tokens,
},
)Make one training example.
Put the tokenized text into the model input.
Attach extra fields needed by the loss function.
`weights` chooses which positions count; `target_tokens` says what each position should predict.
Your model starts repeating the user's question before answering. Where would you look first?
Training is a rhythm: grade the current model, apply the update, save a snapshot, then test behavior.
Datum objects
forward_backward
optim_step
evaluate
fwdbwd_future = await training_client.forward_backward_async(
data=[datum],
loss_fn="cross_entropy",
)
fwdbwd_result = await fwdbwd_future.result_async()
optim_future = await training_client.optim_step_async(
types.AdamParams(learning_rate=1e-4),
)Ask Tinker to compute gradients for this batch.
Wait for the remote work to finish and inspect the loss.
Ask Tinker to update weights using the accumulated gradients.
The learning rate controls update size; too high can destabilize training.
Supervised fine-tuning imitates examples. Reinforcement learning explores, scores, and nudges behavior toward rewards.
Best for imitation learning from target completions.
Useful when training from sampled rollouts and old log-probs.
RL losses for reward-driven behavior changes.
When training gets weird, the symptom usually points to one layer of the loop.
Check `TINKER_API_KEY`, account access, and whether the environment actually exported it.
Inspect data formatting, masks, target token shifting, and model choice before blaming Tinker.
Try lower learning rate, smaller batches, and verify the loss inputs match the selected loss.
Your loss is decreasing, but you are not sure the model is actually better. What is the most practical next move?
This page gives you the mental model. The official docs give exact method signatures, supported models, pricing, and deeper loss math.