Local Models
Adding a Model
To add a local model, navigate to Sidekick
-> Settings
-> Inference
-> Models
,then click the Manage
button to the right of the current model name.
If you have already downloaded a GGUF
model, click the Add Model
button and select the GGUF
model you have downloaded.
If you are looking for a model, click the Download Model
button. This will open a new window where you can select the model you want to download.
Using Speculative Decoding
Speculative decoding is a technique that speeds up the inference process by running a smaller "draft model" in parallel with the main model.
To enable speculative decoding, flip the toggle in Sidekick
-> Settings
-> Inference
.
Selecting a Model
You can choose 3 local models, a main model, a worker model, and a draft model for speculative decoding.
To select a model, navigate to Sidekick
-> Settings
-> Inference
-> Models
, then click the Manage
button to the right of the model's name.
Main Model
This is the main model that powers most work in Sidekick, such as chat, most tools and more.
In addition to Sidekick Settings, the local model can also be selected from the main window. Click the brain icon on the right hand side of the toolbar, and a menu will appear with a list of local models. Click on a model's name to select it.
Worker Model
The worker model is used for simple tasks that demand speed and responsiveness, but can accept trade-offs in quality. This includes automatic conversation titles generation and commands in Inline Writing Assistant.
Ideally, a worker model should be fast and cheap to run. As a result, reasoning models are not recommended.
Draft Model
The draft model is used for speculative decoding. It should be in the same family as the main model, but with dramatically fewer parameters. This draft model must share the same tokenizer as the main model.