CopilotにFARGAN Codecで1kbps以下の音声コーデックが作れるか聞いてみた？

2月 19, 2026

暇つぶしにRADEV2がQPSKで８００Hzの帯域だとしたら多分800bpsぐらいになるだろうと思い1kbps以下でどの程度の性能になるかCopilotに聞いてみた。答えは10秒以内で得られて解析付きで驚いた。結果は聞くことはできるが自然性などは得られないという結論になっている。

ただし、David August 2025, BBFM write up, RADE V2 OTA, Fine timingのV2を聞くと明らかに700Dより良い音でCopilotが言っているようにMELP600bitsより良い音だろうと期待している。上記のブログのV1と比較して聞くとよく聞けば違いは分かるけどそんなに絶対良いとは言えないように思う。

CODECの評価は常に原音との主観比較（MOS評価）を行うものと思うが現状の1.6kbpsのV1と比較をせよと言われると「1.6kbpsの半分になってもintelligibleは遜色ないし感度も上がるのでV2でよいのでは」との結論になってしまうのかもしれない。もう少し音質重視でSSBを完全に上回る国内通信（NVIS)向けの3~4kbpsあたりも開発したほうがＪＡとか欧州ユーザーには喜ばれるように思っている。

先日３．５MHｚでのローカルラグチューを聞いていたら、「相手がせめて千円のPC用のヘッドセットと１万円のマイクを使っているかの違いはこちらでわかるようにしてほしい」とのご希望はごもっともと思った次第。昔からCODECは案外匠で泥臭い世界のであまりとやかく言わないほうが安全と思っていたが、3~4kbpsあれば耳の肥えたラグチュウマンにはCopilotの解析で使っている特徴量のビット再配分で音造りの妙味が楽しんでいただける領域になると思う。

コピーペーストができたので多分開示してよいのだと思って公開のボタンを押してみた。

🧩 What the page explains

The page breaks down whether you can build a sub‑1 kbps speech codec using FARGAN, a neural vocoder.

⭐ Core conclusion

Yes, sub‑1 kbps is possible, but only with extremely aggressive feature quantization.
Speech will remain intelligible, but quality, naturalness, and speaker identity degrade noticeably.

🔍 Why sub‑1 kbps is achievable

FARGAN itself doesn’t impose a bitrate; it just reconstructs audio from:
- spectral envelope
- pitch (F0)
- voicing
- energy
If you compress these features to ≈10 bits per 10 ms frame, you get ~1 kbps.
Even 8 bits per frame (~0.8 kbps) is technically feasible.

📉 The trade‑offs

To reach <1 kbps, you must use very low‑dimensional, heavily quantized features, such as:

1–2 bits for voicing
4–5 bits for pitch
2–3 bits for spectral envelope

This leads to:

robotic timbre
limited speaker identity
artifacts in transitions
reduced naturalness

However, FARGAN’s GAN-based structure may still outperform traditional ultra‑low‑rate codecs like MELP 600 bps at similar bitrates.

🧪 Example “realistic” feature budget

Feature	Bits	Notes
Pitch (F0)	4	Very coarse
Voicing	1	Voiced/unvoiced
Energy	2	Rough loudness
Spectral envelope	3	Very low resolution
Total	10 bits/frame	→ ~1 kbps

🎤 Overall assessment

Goal: intelligible speech at <1 kbps → Achievable.
Goal: natural, pleasant speech at <1 kbps → Not achievable due to fundamental information limits.

🛠️ What the page offers next

The page suggests help with designing:

feature sets for 0.8–1.0 kbps
quantization strategies
pitch encoding
spectral envelope compression
a training pipeline for a FARGAN‑based codec

If you want, I can distill this further into design guidelines for your own codec experiments.

このブログを検索

redpitayaで遊ぶ