Running Fish Speech TTS on TrueNAS with GPU Acceleration

Setting up Fish Speech text-to-speech on TrueNAS SCALE with Docker and NVIDIA GPU support. Covers the gotchas I hit along the way.

5 min read
Running Fish Speech TTS on TrueNAS with GPU Acceleration

I've been looking for a decent local TTS solution for a while. Cloud APIs work fine, but they add up cost-wise and I'd rather keep voice data on my own hardware. Fish Speech caught my attention because it actually sounds natural and runs well on consumer GPUs.

This guide covers getting it running on TrueNAS SCALE with GPU acceleration. I went through a few false starts with Incus containers before landing on the Docker approach, which turned out to be way simpler.

audio-thumbnail
Excerpt voice EN
0:00
/7.198186
audio-thumbnail
Excerpt voice FR
0:00
/8.498503
TrueNAS server with GPU

The Container Situation

My first attempt was setting this up in an Incus container. TrueNAS 25.10 has Incus built in, seemed like the obvious choice. But GPU passthrough was a pain. The devices would show up in the container, then driver version mismatches, then CUDA errors. I spent a couple hours on it before giving up.

Turns out TrueNAS 25.10 has much better NVIDIA support through Docker. The Apps system handles GPU passthrough automatically. You just check a box and it works. Should've started there.

What You Need

  • TrueNAS SCALE 25.10 or newer
  • An NVIDIA GPU (I tested with RTX 3090 and RTX A2000)
  • About 20GB of storage for the model
  • A Hugging Face account for model access

Setting Up NVIDIA Drivers

First thing: enable NVIDIA drivers in TrueNAS. Go to Apps → Configuration → Settings and check Install NVIDIA Drivers. Save it and let TrueNAS do its thing.

You can verify it worked by running nvidia-smi in the shell. Should show your GPU with driver version and memory info.

Getting the Model

Fish Speech uses a model called openaudio-s1-mini from Hugging Face. It's gated, meaning you need to request access first. Head to huggingface.co, make an account if you don't have one, and request access to fishaudio/openaudio-s1-mini. Usually gets approved pretty quick.

Create a dataset on TrueNAS for the Fish Speech data. I put mine at pool/fish-speech with checkpoints and references folders inside.

Download the model using a throwaway container:

docker run --rm \
  -e HF_TOKEN=your_token_here \
  -v /mnt/pool/fish-speech/checkpoints:/checkpoints \
  python:3.11-slim bash -c '
    pip install -q huggingface_hub && 
    python3 -c "
from huggingface_hub import snapshot_download
snapshot_download(
    "fishaudio/openaudio-s1-mini",
    local_dir="/checkpoints/openaudio-s1-mini",
    token="'$HF_TOKEN'"
)"'

Takes a few minutes. About 3.5GB to download.

Audio waveform visualization

Deploying the App

Here's where TrueNAS makes things easy. Go to Apps → Discover → Custom App.

Basic Settings

Application name: fish-speech

Image repository: fishaudio/fish-speech

Tag: latest

The Important Part: Container Settings

This tripped me up at first. The default image tries to run a web UI, but we want the API server. You need to override the entrypoint.

Set Container Entrypoint to:

/app/.venv/bin/python

Then in Container Command, enter each argument on its own line:

-m
tools.api_server
--listen
0.0.0.0:8080
--llama-checkpoint-path
/app/checkpoints/openaudio-s1-mini
--decoder-checkpoint-path
/app/checkpoints/openaudio-s1-mini/codec.pth

One argument per line. TrueNAS is picky about this.

Networking and Storage

Add a port mapping: 8080 → 8080 TCP.

For storage, add a host path mount:

  • Host path: /mnt/pool/fish-speech/checkpoints
  • Mount path: /app/checkpoints

GPU Selection

Under Resources, pick your NVIDIA GPU from the dropdown. That's it. TrueNAS handles the passthrough automatically.

Hit Save and watch the logs. You should see it load the model and eventually show:

INFO | Startup done, listening server at http://0.0.0.0:8080
API terminal output

Testing It Out

Quick test to make sure it's working:

curl -X POST "http://your-truenas-ip:8080/v1/tts" \
  -H "Content-Type: application/json" \
  -d '{"text": "Testing one two three."}' \
  --output test.wav

If you get a WAV file back, you're good. The API docs are at the root URL if you want to explore the options.

How Fast Is It?

I ran the same test on two different GPUs to see the difference:

GPUVRAMTimeSpeed
RTX 309024 GB4.85s~22 tok/s
RTX A20006 GB9.35s~8 tok/s
CPU only-~45spainful

The 3090 is about twice as fast as the A2000, which makes sense given the specs. Both are perfectly usable. CPU-only is technically possible but you'll be waiting a while.

Voice Cloning

The neat thing about Fish Speech is voice cloning. Give it a short audio sample and it can generate speech in that voice.

mkdir /mnt/pool/fish-speech/references/custom_voice
cp recording.wav /mnt/pool/fish-speech/references/custom_voice/sample.wav

Then use it in your API call:

curl -X POST "http://your-truenas-ip:8080/v1/tts" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from a cloned voice.", "reference_id": "custom_voice"}' \
  --output cloned.wav

Works surprisingly well with just 10-30 seconds of sample audio.

Things That Might Go Wrong

If the container keeps restarting, check the logs. Usually it's one of these:

  • Model files not found - double check your storage mount path
  • GPU not available - make sure NVIDIA drivers are installed
  • Wrong entrypoint - needs to be /app/.venv/bin/python exactly

If you get CUDA out of memory errors, make sure you're only running one instance. The model needs about 5GB of VRAM.

Worth It?

For a home setup, absolutely. No API costs, no sending voice data to the cloud, and the quality is surprisingly good. The TrueNAS Custom App approach means it shows up in your dashboard with proper logging and restart controls.

If you're doing high-volume TTS, a beefy GPU makes a real difference. But even a mid-range card like the A2000 is totally usable for occasional generation.