This demo runs a neural network specialized for fast pitch detection of human voices. It is based on the FCPE model, with the audio extractor rewritten in browser-compatible WASM and the ONNX model quantised to FP16 to minimise latency.
Paper: https://arxiv.org/abs/2509.15140
Model: web/static/fcpe.single.fp16.onnx