What Tinker Is

Tinker is a remote-control for serious model training: you keep the steering wheel, Thinking Machines supplies the engine room.

Your laptop writes the recipe.

You define the data, the loss function, and the loop.

dataalgorithmevaluation

Tinker runs the factory.

The API sends your exact training work to distributed GPUs and gives results back.

large GPUsLoRAcheckpoints

The split of responsibility

1

You

Write a normal Python training loop.

→

2

Tinker

Executes gradient work on big models.

→

3

You

Inspect losses, sample, evaluate, repeat.

Aha

This is not autopilot.

Tinker removes distributed training plumbing, not your judgment. That matters because research code often needs weird losses, custom rewards, and tight control.

02

Meet The Actors

A Tinker script has a small cast. Learn who does what so you can tell an AI agent where new logic belongs.

S

ServiceClient

T

TrainingClient

G

SamplingClient

D

Datum

Click any actor to see its job.

The first code you write

Code

import tinker
from tinker import types

service_client = tinker.ServiceClient()

training_client = service_client.create_lora_training_client(
    base_model="Qwen/Qwen3-8B",
    rank=32,
)

Plain English

Load the SDK and its type helpers.

Create the front door into Tinker.

Start a LoRA training run for one base model.

The `rank` controls adapter size: bigger can learn more, but costs more.

03

Data Becomes Training

A `Datum` is the envelope that tells Tinker: here are the tokens, here is what to predict, and here is what to ignore.

Think of a stencil

The prompt is covered up with `weights=0`; the answer is exposed with `weights=1`. The model only gets graded through the exposed holes.

Prompt: ignoredCompletion: trainedTargets: next token

Why this matters

If an AI agent trains on the prompt by mistake, the model learns to copy questions instead of answer them. Loss masks are a common place to inspect first.

Code

datum = types.Datum(
    model_input=types.ModelInput.from_ints(tokens=input_tokens),
    loss_fn_inputs={
        "weights": weights,
        "target_tokens": target_tokens,
    },
)

Plain English

Make one training example.

Put the tokenized text into the model input.

Attach extra fields needed by the loss function.

`weights` chooses which positions count; `target_tokens` says what each position should predict.

Scenario check

Scenario

Your model starts repeating the user's question before answering. Where would you look first?

04

The Training Loop

Training is a rhythm: grade the current model, apply the update, save a snapshot, then test behavior.

Batch

Datum objects

Gradients

forward_backward

Weights

optim_step

Samples

evaluate

Click next to walk through one training cycle.

Code

fwdbwd_future = await training_client.forward_backward_async(
    data=[datum],
    loss_fn="cross_entropy",
)
fwdbwd_result = await fwdbwd_future.result_async()

optim_future = await training_client.optim_step_async(
    types.AdamParams(learning_rate=1e-4),
)

Plain English

Ask Tinker to compute gradients for this batch.

Wait for the remote work to finish and inspect the loss.

Ask Tinker to update weights using the accumulated gradients.

The learning rate controls update size; too high can destabilize training.

05

SFT Versus RL

Supervised fine-tuning imitates examples. Reinforcement learning explores, scores, and nudges behavior toward rewards.

Supervised Fine-Tuning

Use this when you have examples of the exact behavior you want. It is like teaching from an answer key.

1

Examples

→

2

Cross-entropy

→

3

Imitation

CE

cross_entropy

Best for imitation learning from target completions.

IS

importance_sampling

Useful when training from sampled rollouts and old log-probs.

PPO

ppo / cispo / dro

RL losses for reward-driven behavior changes.

06

Debugging Map

When training gets weird, the symptom usually points to one layer of the loop.

!

Auth fails

Check `TINKER_API_KEY`, account access, and whether the environment actually exported it.

M

Bad generations

Inspect data formatting, masks, target token shifting, and model choice before blaming Tinker.

LR

Loss explodes

Try lower learning rate, smaller batches, and verify the loss inputs match the selected loss.

Final scenario

Scenario

Your loss is decreasing, but you are not sure the model is actually better. What is the most practical next move?

This page gives you the mental model. The official docs give exact method signatures, supported models, pricing, and deeper loss math.

Open Tinker docs

What Tinker Is

The split of responsibility

Meet The Actors

The first code you write

Data Becomes Training

Think of a stencil

Why this matters

Scenario check

The Training Loop

SFT Versus RL

Supervised Fine-Tuning

Reinforcement Learning

How to choose

cross_entropy

importance_sampling

ppo / cispo / dro

Debugging Map

Auth fails

Bad generations

Loss explodes

Final scenario