I host my own music server on Navidrome, carrying roughly 28,000 tracks spread over 2,987 albums and 1,078 artists. Almost all of them are metal—from headliners to the deep-cut underground. The files sit on a TrueNAS NFS share under a tidy Artist/Album/Track layout, but the actual ID3/FLAC tags were a nightmare: titles missing, track numbers scrambled, and no release year at all.
Navidrome doesn’t look at folder names; it consumes the tags themselves. As a result, my library’s display was a disaster. I decided one Sunday morning to straighten it all out with Python.

The Plan
Artist/Album/
│
├─ Search MusicBrainz (artist + album)
│ ↓
│ Confidence ≥ 90%?
│ ┌──Yes──┐ ┌──No──┐
│ Tags MB Tags from
│ (exact filename
│ title, (fallback)
│ year,
│ label)
│ ↓
└─ For each track: match by track# + fuzzy title
│
└─ Still unmatched? AcoustID fingerprint → MusicBrainzI ended up writing four scripts that run as a pipeline:
- tag_music_mb.py — tags files using MusicBrainz text search with fuzzy matching
- acoustid_tagger.py — fingerprints unmatched files and identifies them via AcoustID
- fetch_covers.py — downloads album artwork from Cover Art Archive + Deezer
- embed_covers.py — embeds the artwork directly into each audio file
Step 1: MusicBrainz Text Search
The idea is straightforward: for every Artist/Album folder I hit MusicBrainz. If the match confidence is 90% or higher, I grab the official metadata. If not, I fall back to whatever I can extract from the filenames.
MusicBrainz Search
import musicbrainzngs as mb
from rapidfuzz import fuzz
mb.set_useragent("NavidromeAutoTagger", "1.0", "https://example.com")
mb.set_rate_limit(limit_or_interval=1.0) # 1 req/s
def search_release(artist: str, album: str):
query = f'artist:"{artist}" AND release:"{album}"'
result = mb.search_releases(query=query, limit=5)
releases = result.get("release-list", [])
best, best_score = None, 0.0
for rel in releases:
artist_score = fuzz.token_sort_ratio(
artist.lower(),
rel.get("artist-credit-phrase", "").lower()
)
album_score = fuzz.token_sort_ratio(
album.lower(), rel.get("title", "").lower()
)
sc = (artist_score + album_score) / 2
if sc > best_score:
best, best_score = rel, sc
return best, best_scoreRapidfuzz’s token_sort_ratio is the hero here, turning strings like “The Black Dahlia Murder” and “Black Dahlia Murder, The” into comparable tokens, sorting them, and then comparing. The 90% cutoff weeds out spurious matches while still catching minor variations.
Track Matching
When a release is identified, I pull its full tracklist from MusicBrainz and then try to map each local file to a track entry:
- By track number — if the filename starts with
01 - Title.flac, match to position 1 - By fuzzy title — if numbers don’t match, fall back to title similarity
Filename Fallback
If MusicBrainz turns up empty—typical for obscure releases with only a couple hundred listeners—I extract whatever clues I can from the filenames and surrounding metadata:
def parse_track_filename(filename: str):
"""'01 - Song Title.mp3' -> ('1', 'Song Title')"""
stem = Path(filename).stem
m = re.match(r'^(\d{1,3})\s*[-._\s]\s*(.+)$', stem)
if m:
return m.group(1).lstrip("0") or "1", m.group(2).strip()
return None, stem.strip()Step 2: AcoustID Fingerprinting
For the ~15,000 files that MusicBrainz text search couldn’t match, I brought in the heavy artillery: AcoustID. Instead of searching by name, it analyzes the actual audio waveform and generates a unique fingerprint using Chromaprint. That fingerprint gets looked up against a database of 40+ million tracks.
This catches albums that MusicBrainz missed due to spelling differences, regional editions, or simply obscure names—because the audio itself doesn’t lie.
import acoustid
def lookup(filepath):
duration, fp = acoustid.fingerprint_file(str(filepath))
params = {
"client": ACOUSTID_KEY,
"duration": str(int(duration)),
"fingerprint": fp,
"meta": "recordings releases"
}
r = requests.post(
"https://api.acoustid.org/v2/lookup",
data=params, timeout=15
)
results = r.json().get("results", [])
if results and results[0].get("score", 0) >= 0.7:
rec = results[0]["recordings"][0]
return {
"title": rec.get("title"),
"artist": rec["artists"][0]["name"],
"album": rec["releases"][0]["title"],
...
}Even for underground black metal demos from 1996, AcoustID pulled matches at 97-100% confidence. The metal community on MusicBrainz is nothing if not thorough.
Tagging Results

| Method | Files | % |
|---|---|---|
| MusicBrainz (text search) | 16,033 | 57% |
| AcoustID (fingerprint) | 253 | 1% |
| Local fallback (filename) | 11,625 | 41% |
| Errors | 384 | 1% |
A 58% verified match rate across both methods is solid for a library this underground. The remaining 41% still got tagged from filenames—not perfect metadata, but way better than blank tags.
Step 3: Album Covers
With the tags in the right place, the next missing piece was album art. Navidrome automatically picks up any cover.jpg that lives in an album’s folder.
Cover Art Archive
The Cover Art Archive lives alongside MusicBrainz, offering free, community-curated album covers. Once a release is matched, we already have its release_id:
def fetch_cover_caa(release_id: str):
url = f"https://coverartarchive.org/release/{release_id}/front-500"
r = requests.get(url, timeout=15, allow_redirects=True)
if r.status_code == 200 and len(r.content) > 1000:
return r.content
return NoneDeezer Fallback
For releases that escape MusicBrainz or lack CAA art, the Deezer API steps in as a reliable fallback—no API key needed:
def fetch_cover_deezer(artist: str, album: str):
r = requests.get(
"https://api.deezer.com/search/album",
params={"q": f'artist:"{artist}" album:"{album}"', "limit": 5},
timeout=10,
)
data = r.json().get("data", [])
# Find best fuzzy match, download cover_bigThis pulls in a surprising number of albums that MusicBrainz misses—because Deezer’s catalog is built around commercial releases, its coverage pattern is a little different.
| Source | Covers |
|---|---|
| Cover Art Archive | 611 |
| Deezer (fallback) | 685 |
| Already present | 558 |
| Not found | 861 albums (29%) |
Step 4: Embedding Artwork
After getting tags and cover files sorted, the last step was embedding the artwork directly into the audio files. Some players—especially mobile ones—don’t look for cover.jpg in the folder. They want the art inside the file itself.
def embed_mp3(filepath, cover_data, mime):
audio = MP3(filepath)
if audio.tags is None:
audio.add_tags()
audio.tags.delall("APIC")
audio.tags.add(APIC(
encoding=3, mime=mime, type=3,
desc="Front cover", data=cover_data
))
audio.save()
def embed_flac(filepath, cover_data, mime):
audio = FLAC(filepath)
audio.clear_pictures()
pic = Picture()
pic.type = 3 # Front cover
pic.mime = mime
pic.desc = "Front cover"
pic.data = cover_data
audio.add_picture(pic)
audio.save()Final State
After running the full pipeline, here’s what the library looks like:
| Metric | Value |
|---|---|
| Artists | 1,078 |
| Albums | 2,987 |
| Audio files | 28,295 (25,753 MP3 + 2,542 FLAC) |
| Total size | 401 GB |
| Title tag coverage | 99% |
| Artist tag coverage | 99% |
| Album tag coverage | 99% |
| Year tag coverage | 99% |
| Embedded artwork | 91% |
| Albums with cover file | 2,167 (73%) |
Top artists by album count: Shining (28), Motörhead (27), Watain (26), Overkill (26), Napalm Death (26), Satyricon (25), Kreator (25), Testament (24), Kataklysm (24), Heaven Shall Burn (23).
The Full Pipeline
All four scripts live in a GitLab repo. Run them in order—the whole pipeline takes about three hours for a 28,000-track library and requires zero manual intervention.
pip install mutagen musicbrainzngs rapidfuzz requests pyacoustid
apt install libchromaprint-tools # for fpcalc
# Step 1: Tag via MusicBrainz text search
python3 tag_music_mb.py /mnt/music
# Step 2: Fingerprint unmatched files via AcoustID
python3 acoustid_tagger.py /mnt/music --log tag_music_mb.log
# Step 3: Download missing artwork
python3 fetch_covers.py /mnt/music
# Step 4: Embed artwork into audio files
python3 embed_covers.py /mnt/musicFor a big library, run them inside screen or tmux—they’re slow but completely safe, no risk of corrupting your data.
The Full Tagger Script
Here’s the full tag_music_mb.py script—drop it on any machine with Python 3.10+ and its dependencies installed:
#!/usr/bin/env python3
"""
tag_music_mb.py — Tagger MP3/FLAC via MusicBrainz pour Navidrome
Structure attendue : /music_root/Artiste/Album/piste.mp3|flac
Dépendances :
pip install mutagen musicbrainzngs rapidfuzz
"""
import os
import re
import sys
import time
import logging
import argparse
from pathlib import Path
# ── Dépendances externes ──────────────────────────────────────────────────────
try:
import musicbrainzngs as mb
except ImportError:
print("❌ pip install musicbrainzngs")
sys.exit(1)
try:
from rapidfuzz import fuzz
except ImportError:
print("❌ pip install rapidfuzz")
sys.exit(1)
try:
from mutagen.mp3 import MP3
from mutagen.id3 import (
ID3, ID3NoHeaderError,
TIT2, TPE1, TALB, TRCK, TDRC, TPE2, TPOS, TCON, TPUB
)
from mutagen.flac import FLAC
except ImportError:
print("❌ pip install mutagen")
sys.exit(1)
# ── MusicBrainz user-agent (obligatoire) ─────────────────────────────────────
mb.set_useragent("NavidromeAutoTagger", "1.0", "https://github.com/local/tagger")
mb.set_rate_limit(limit_or_interval=1.0) # 1 req/s max (règle MusicBrainz)
# ── Logger ────────────────────────────────────────────────────────────────────
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s %(message)s",
handlers=[
logging.StreamHandler(),
logging.FileHandler("tag_music_mb.log", encoding="utf-8"),
],
)
log = logging.getLogger(__name__)
# ── Seuil de confiance ────────────────────────────────────────────────────────
CONFIDENCE_THRESHOLD = 90 # %
# ═══════════════════════════════════════════════════════════════════════════════
# Parsing local (fallback)
# ═══════════════════════════════════════════════════════════════════════════════
def parse_track_filename(filename: str):
"""'01 - Titre de la piste.mp3' → ('1', 'Titre de la piste')"""
stem = Path(filename).stem
m = re.match(r'^(\d{1,3})\s*[-._\s]\s*(.+)$', stem)
if m:
return m.group(1).lstrip("0") or "1", m.group(2).strip()
return None, stem.strip()
def parse_year(text: str):
m = re.search(r'\b(19|20)\d{2}\b', text)
return m.group(0) if m else None
def parse_disc(folder_name: str):
"""'CD1', 'Disc 2', 'Disk1' → '1' (ou None)"""
m = re.match(r'^(?:CD|Disc|Disk)\s*(\d+)$', folder_name, re.IGNORECASE)
return m.group(1) if m else None
# ═══════════════════════════════════════════════════════════════════════════════
# MusicBrainz helpers
# ═══════════════════════════════════════════════════════════════════════════════
def _score(a: str, b: str) -> float:
"""Similarité token_sort entre deux chaînes, insensible à la casse."""
return fuzz.token_sort_ratio(a.lower(), b.lower())
def search_release(artist: str, album: str):
"""
Cherche une release MusicBrainz.
Retourne (release_dict, confidence_float) ou (None, 0).
"""
query = f'artist:"{artist}" AND release:"{album}"'
try:
result = mb.search_releases(query=query, limit=5)
except mb.NetworkError as e:
log.warning(f"MusicBrainz network error: {e}")
return None, 0
releases = result.get("release-list", [])
if not releases:
return None, 0
best, best_score = None, 0.0
for rel in releases:
artist_credit = rel.get("artist-credit-phrase", "")
rel_title = rel.get("title", "")
sc = (_score(artist, artist_credit) + _score(album, rel_title)) / 2
if sc > best_score:
best, best_score = rel, sc
return best, best_score
def fetch_tracks(release_id: str):
"""
Retourne la liste des pistes depuis le release MusicBrainz.
[{'position': '1', 'disc': '1', 'title': '…', 'length': …}, …]
"""
try:
data = mb.get_release_by_id(release_id, includes=["recordings", "media"])
except mb.NetworkError as e:
log.warning(f"MusicBrainz fetch error: {e}")
return []
tracks = []
for media in data["release"].get("medium-list", []):
disc_pos = media.get("position", "1")
for t in media.get("track-list", []):
tracks.append({
"disc": str(disc_pos),
"position": t.get("position", ""),
"title": t.get("recording", {}).get("title", ""),
})
return tracks
def match_track(local_title: str, local_num, mb_tracks: list):
"""
Retrouve la piste MusicBrainz correspondante.
Priorité : numéro de piste → fuzzy titre.
Retourne (mb_track_dict, confidence) ou (None, 0).
"""
# 1. Correspondance par numéro
if local_num:
for t in mb_tracks:
if t["position"] == str(local_num):
sc = _score(local_title, t["title"])
return t, max(sc, 70) # on fait confiance au numéro
# 2. Fuzzy titre seul
best, best_sc = None, 0.0
for t in mb_tracks:
sc = _score(local_title, t["title"])
if sc > best_sc:
best, best_sc = t, sc
return best, best_sc
# ═══════════════════════════════════════════════════════════════════════════════
# Taggers mutagen
# ═══════════════════════════════════════════════════════════════════════════════
def apply_mp3(filepath: Path, tags: dict):
try:
audio = ID3(filepath)
except ID3NoHeaderError:
audio = ID3()
def s(v): return v or ""
audio.add(TIT2(encoding=3, text=s(tags.get("title"))))
audio.add(TPE1(encoding=3, text=s(tags.get("artist"))))
audio.add(TPE2(encoding=3, text=s(tags.get("albumartist"))))
audio.add(TALB(encoding=3, text=s(tags.get("album"))))
if tags.get("tracknumber"):
audio.add(TRCK(encoding=3, text=str(tags["tracknumber"])))
if tags.get("year"):
audio.add(TDRC(encoding=3, text=str(tags["year"])))
if tags.get("discnumber"):
audio.add(TPOS(encoding=3, text=str(tags["discnumber"])))
if tags.get("genre"):
audio.add(TCON(encoding=3, text=s(tags["genre"])))
if tags.get("label"):
audio.add(TPUB(encoding=3, text=s(tags["label"])))
audio.save(filepath, v2_version=3)
def apply_flac(filepath: Path, tags: dict):
audio = FLAC(filepath)
audio["title"] = tags.get("title", "")
audio["artist"] = tags.get("artist", "")
audio["albumartist"] = tags.get("albumartist", "")
audio["album"] = tags.get("album", "")
if tags.get("tracknumber"):
audio["tracknumber"] = str(tags["tracknumber"])
if tags.get("year"):
audio["date"] = str(tags["year"])
if tags.get("discnumber"):
audio["discnumber"] = str(tags["discnumber"])
if tags.get("genre"):
audio["genre"] = tags["genre"]
if tags.get("label"):
audio["organization"] = tags["label"]
audio.save()
TAGGERS = {".mp3": apply_mp3, ".flac": apply_flac}
# ═══════════════════════════════════════════════════════════════════════════════
# Cache album (évite les appels MusicBrainz répétés)
# ═══════════════════════════════════════════════════════════════════════════════
_album_cache: dict = {} # (artist, album) → (release, mb_tracks, confidence)
def resolve_album(artist: str, album: str):
key = (artist.lower(), album.lower())
if key in _album_cache:
return _album_cache[key]
release, confidence = search_release(artist, album)
mb_tracks = []
extra = {}
if release and confidence >= CONFIDENCE_THRESHOLD:
rid = release["id"]
mb_tracks = fetch_tracks(rid)
# Métadonnées complémentaires
extra["year"] = (release.get("date") or "")[:4] or parse_year(album)
extra["label"] = (release.get("label-info-list") or [{}])[0] \
.get("label", {}).get("name", "")
# Genre : premier tag MusicBrainz si dispo (souvent absent sans include)
extra["genre"] = ""
log.info(
f" ✔ MusicBrainz [{confidence:.0f}%] "
f"{release.get('artist-credit-phrase')} — {release.get('title')} "
f"({extra.get('year','')})"
)
else:
if release:
log.warning(
f" ⚠ MusicBrainz confiance insuffisante [{confidence:.0f}%] "
f"pour « {artist} / {album} » → tags locaux"
)
else:
log.warning(f" ⚠ Aucun résultat MusicBrainz pour « {artist} / {album} » → tags locaux")
release = None
_album_cache[key] = (release, mb_tracks, confidence, extra)
time.sleep(0.2) # politesse supplémentaire
return _album_cache[key]
# ═══════════════════════════════════════════════════════════════════════════════
# Parcours de la bibliothèque
# ═══════════════════════════════════════════════════════════════════════════════
def process_library(root: Path, dry_run: bool):
stats = {"mb": 0, "local": 0, "error": 0, "skip": 0}
for artist_dir in sorted(root.iterdir()):
if not artist_dir.is_dir():
continue
artist_name = artist_dir.name
for album_dir in sorted(artist_dir.iterdir()):
if not album_dir.is_dir():
continue
album_raw = album_dir.name
local_year = parse_year(album_raw)
# Nettoie l'année du nom d'album pour la recherche
album_clean = re.sub(r'[\(\[]\s*(19|20)\d{2}\s*[\)\]]', '', album_raw).strip()
album_clean = re.sub(r'\s*[-–]\s*(19|20)\d{2}$', '', album_clean).strip()
log.info(f"\n{'─'*60}")
log.info(f"🎵 {artist_name} / {album_clean}")
release, mb_tracks, confidence, extra = resolve_album(artist_name, album_clean)
use_mb = release is not None
# Recherche des fichiers (dossier courant + sous-dossiers CD/Disc)
def gather_files(directory: Path, disc_num=None):
disc = disc_num or parse_disc(directory.name)
for fp in sorted(directory.iterdir()):
if fp.is_file() and fp.suffix.lower() in TAGGERS:
yield fp, disc
elif fp.is_dir():
d = parse_disc(fp.name)
if d:
yield from gather_files(fp, disc_num=d)
for filepath, disc_num in gather_files(album_dir):
ext = filepath.suffix.lower()
local_num, local_title = parse_track_filename(filepath.name)
rel = filepath.relative_to(root)
# ── Tags de base (fallback local) ──────────────────────────
tags = {
"artist": artist_name,
"albumartist": artist_name,
"album": album_clean,
"title": local_title,
"tracknumber": local_num,
"discnumber": disc_num,
"year": local_year,
"genre": "",
"label": "",
}
source = "local"
# ── Enrichissement MusicBrainz ─────────────────────────────
if use_mb:
mb_track, track_sc = match_track(local_title, local_num, mb_tracks)
if mb_track and track_sc >= CONFIDENCE_THRESHOLD:
tags["title"] = mb_track["title"]
tags["tracknumber"] = mb_track["position"]
tags["discnumber"] = mb_track.get("disc", disc_num)
tags["year"] = extra.get("year") or local_year
tags["genre"] = extra.get("genre", "")
tags["label"] = extra.get("label", "")
source = f"MB[{track_sc:.0f}%]"
else:
sc_str = f"{track_sc:.0f}%" if mb_track else "—"
log.warning(
f" ⚠ Piste non matchée [{sc_str}] : {filepath.name} → tags locaux"
)
# ── Écriture ───────────────────────────────────────────────
if dry_run:
log.info(f" [DRY] {rel}")
log.info(f" [{source}] {tags['tracknumber']}. {tags['title']} "
f"| {tags['year']} | disc:{tags['discnumber']}")
stats["mb" if source != "local" else "local"] += 1
continue
try:
TAGGERS[ext](filepath, tags)
log.info(
f" ✔ [{source}] {rel.name} → "
f"{tags['tracknumber']}. {tags['title']}"
)
stats["mb" if source != "local" else "local"] += 1
except Exception as exc:
log.error(f" ✘ {rel} → {exc}")
stats["error"] += 1
return stats
# ═══════════════════════════════════════════════════════════════════════════════
# Main
# ═══════════════════════════════════════════════════════════════════════════════
def main():
parser = argparse.ArgumentParser(
description="Tagger MP3/FLAC via MusicBrainz (≥90%% de confiance) pour Navidrome."
)
parser.add_argument(
"root",
help="Racine de la bibliothèque (ex: /mnt/music)"
)
parser.add_argument(
"--dry-run", "-n",
action="store_true",
help="Simulation — aucun fichier modifié"
)
args = parser.parse_args()
root = Path(args.root)
if not root.exists():
log.error(f"Chemin introuvable : {root}")
sys.exit(1)
mode = "DRY-RUN 🔍" if args.dry_run else "ÉCRITURE ✏️"
log.info(f"{'═'*60}")
log.info(f" tag_music_mb.py — {mode}")
log.info(f" Racine : {root}")
log.info(f" Seuil MB : {CONFIDENCE_THRESHOLD}%")
log.info(f"{'═'*60}")
stats = process_library(root, dry_run=args.dry_run)
log.info(f"\n{'═'*60}")
log.info(f" ✔ MB : {stats['mb']} fichiers taggés via MusicBrainz")
log.info(f" ✔ Local : {stats['local']} fichiers taggés en local (fallback)")
log.info(f" ✘ Erreurs : {stats['error']}")
log.info(f" Log complet → tag_music_mb.log")
log.info(f"{'═'*60}")
if __name__ == "__main__":
main()
What’s Next
- Incremental mode: Currently these are full-scan scripts. A
--newer-thanflag or inotify watcher would make them useful for ongoing library maintenance. - Submit fingerprints back: The ~14,000 unmatched files could be submitted to AcoustID to help the next person with obscure taste.
Now my Navidrome feels a lot more polished: accurate artist names, tracks in the right order, release years on point, and album art everywhere. Pretty bang-for-the-buck for a Sunday project.
Tools I used: MusicBrainz, AcoustID, Cover Art Archive, Deezer API, Navidrome, mutagen, rapidfuzz. All scripts: navidrome-tools