I've been looking for a decent local TTS solution for a while. Cloud APIs work fine, but they add up cost-wise and I'd rather keep voice data on my own hardware. Fish Speech caught my attention because it actually sounds natural and runs well on consumer GPUs.
This guide covers getting it running on TrueNAS SCALE with GPU acceleration. I went through a few false starts with Incus containers before landing on the Docker approach, which turned out to be way simpler.

The Container Situation
My first attempt was setting this up in an Incus container. TrueNAS 25.10 has Incus built in, seemed like the obvious choice. But GPU passthrough was a pain. The devices would show up in the container, then driver version mismatches, then CUDA errors. I spent a couple hours on it before giving up.
Turns out TrueNAS 25.10 has much better NVIDIA support through Docker. The Apps system handles GPU passthrough automatically. You just check a box and it works. Should've started there.
What You Need
- TrueNAS SCALE 25.10 or newer
- An NVIDIA GPU (I tested with RTX 3090 and RTX A2000)
- About 20GB of storage for the model
- A Hugging Face account for model access
Setting Up NVIDIA Drivers
First thing: enable NVIDIA drivers in TrueNAS. Go to Apps → Configuration → Settings and check Install NVIDIA Drivers. Save it and let TrueNAS do its thing.
You can verify it worked by running nvidia-smi in the shell. Should show your GPU with driver version and memory info.
Getting the Model
Fish Speech uses a model called openaudio-s1-mini from Hugging Face. It's gated, meaning you need to request access first. Head to huggingface.co, make an account if you don't have one, and request access to fishaudio/openaudio-s1-mini. Usually gets approved pretty quick.
Create a dataset on TrueNAS for the Fish Speech data. I put mine at pool/fish-speech with checkpoints and references folders inside.
Download the model using a throwaway container:
docker run --rm \
-e HF_TOKEN=your_token_here \
-v /mnt/pool/fish-speech/checkpoints:/checkpoints \
python:3.11-slim bash -c '
pip install -q huggingface_hub &&
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download(
"fishaudio/openaudio-s1-mini",
local_dir="/checkpoints/openaudio-s1-mini",
token="'$HF_TOKEN'"
)"'Takes a few minutes. About 3.5GB to download.

Deploying the App
Here's where TrueNAS makes things easy. Go to Apps → Discover → Custom App.
Basic Settings
Application name: fish-speech
Image repository: fishaudio/fish-speech
Tag: latest
The Important Part: Container Settings
This tripped me up at first. The default image tries to run a web UI, but we want the API server. You need to override the entrypoint.
Set Container Entrypoint to:
/app/.venv/bin/pythonThen in Container Command, enter each argument on its own line:
-m
tools.api_server
--listen
0.0.0.0:8080
--llama-checkpoint-path
/app/checkpoints/openaudio-s1-mini
--decoder-checkpoint-path
/app/checkpoints/openaudio-s1-mini/codec.pthOne argument per line. TrueNAS is picky about this.
Networking and Storage
Add a port mapping: 8080 → 8080 TCP.
For storage, add a host path mount:
- Host path:
/mnt/pool/fish-speech/checkpoints - Mount path:
/app/checkpoints
GPU Selection
Under Resources, pick your NVIDIA GPU from the dropdown. That's it. TrueNAS handles the passthrough automatically.
Hit Save and watch the logs. You should see it load the model and eventually show:
INFO | Startup done, listening server at http://0.0.0.0:8080
Testing It Out
Quick test to make sure it's working:
curl -X POST "http://your-truenas-ip:8080/v1/tts" \
-H "Content-Type: application/json" \
-d '{"text": "Testing one two three."}' \
--output test.wavIf you get a WAV file back, you're good. The API docs are at the root URL if you want to explore the options.
How Fast Is It?
I ran the same test on two different GPUs to see the difference:
| GPU | VRAM | Time | Speed |
|---|---|---|---|
| RTX 3090 | 24 GB | 4.85s | ~22 tok/s |
| RTX A2000 | 6 GB | 9.35s | ~8 tok/s |
| CPU only | - | ~45s | painful |
The 3090 is about twice as fast as the A2000, which makes sense given the specs. Both are perfectly usable. CPU-only is technically possible but you'll be waiting a while.
Voice Cloning
The neat thing about Fish Speech is voice cloning. Give it a short audio sample and it can generate speech in that voice.
mkdir /mnt/pool/fish-speech/references/custom_voice
cp recording.wav /mnt/pool/fish-speech/references/custom_voice/sample.wavThen use it in your API call:
curl -X POST "http://your-truenas-ip:8080/v1/tts" \
-H "Content-Type: application/json" \
-d '{"text": "Hello from a cloned voice.", "reference_id": "custom_voice"}' \
--output cloned.wavWorks surprisingly well with just 10-30 seconds of sample audio.
Things That Might Go Wrong
If the container keeps restarting, check the logs. Usually it's one of these:
- Model files not found - double check your storage mount path
- GPU not available - make sure NVIDIA drivers are installed
- Wrong entrypoint - needs to be /app/.venv/bin/python exactly
If you get CUDA out of memory errors, make sure you're only running one instance. The model needs about 5GB of VRAM.
Worth It?
For a home setup, absolutely. No API costs, no sending voice data to the cloud, and the quality is surprisingly good. The TrueNAS Custom App approach means it shows up in your dashboard with proper logging and restart controls.
If you're doing high-volume TTS, a beefy GPU makes a real difference. But even a mid-range card like the A2000 is totally usable for occasional generation.
Links
- Fish Speech on GitHub: github.com/fishaudio/fish-speech
- Models on Hugging Face: huggingface.co/fishaudio