LegalKit Retrieval, a binary Search with Scalar (int8) Rescoring through French legal codes

This space showcases the tsdae-lemone-mbert-base model by Louis Brulé Naudet, a sentence embedding model based on BERT fitted using Transformer-based Sequential Denoising Auto-Encoder for unsupervised sentence embedding learning with one objective : french legal domain adaptation.

This process is designed to be memory efficient and fast, with the binary index being small enough to fit in memory and the int8 index being loaded as a view to save memory. Additionally, the binary index is much faster (up to 32x) to search than the float32 index, while the rescoring is also extremely efficient.

1 100
1 10