Lingvanex translates quickly but literally

All Languages
Chinese
Esperanto
French
German
Hungarian
Italian
Japanese
Korean
Spanish
Swedish
Ukrainian
Vietnamese
ModelOverall ScoreCoherenceIdiomaticityAccuracyLatency (ms)
MeanIQRP90MeanIQRP90MeanIQRP90MedianP90
gpt-4o-2024-08-068.928.75178.07268.90188031288
deepl8.818.59178.09268.6717226322
claude-3-5-sonnet-202410228.778.53178.39178.921813873112
gemma-3-27b-it8.598.48178.31178.921811151605
llama-3.3-70b-versatile8.548.36177.72268.62173341687
lingvanex8.488.69277.42348.5416206270
gemini-2.0-flash-exp8.268.66178.24169.0018514702
gemma2-9b-it7.907.87267.70258.4327407489
llama-3.1-8b-instant6.186.03626.89347.5924266818
mistral-small-latest5.256.54527.18347.45236031659

Nuenki needs to do a lot of translation, quickly (to avoid noticeable latency when browsing), and at a high quality - learning from mistakes can do more harm than good. In previous blog posts (1, 2) I compared general LLMs, but I'd recently heard about Lingvanex, a specialised translation service, and wanted to give it a try. I also added DeepL for context while I was there.

Lingvanex

Lingvanex's website promises "advanced natural processing solutions", or in other words, on-premise unlimited-usage ML translation. They also offer an API with reasonable pricing.

Excellent latency

While the latency numbers should be taken with a grain of salt as they're only taken from a single geographic location (in the UK), Lingvanex clearly has comparable latency to DeepL. It's also very consistent, with a low P90 (worst 10%).

Coherent, but not as idiomatic

Across all languages, Lingvanex has middling performance. However, it gets more interesting as you go through individual languages. For some, like French, it beats everything but DeepL. Yet across every language there is a consistent trend of high coherence and low idiomaticity. I think that this is a result of Lingvanex (and DeepL, which has the same trend at a low amplitude) using small models that are finetuned to be good at translation. Larger models may be slower, costlier, and unnecessarily generalised, but they're able to translate idiomatically rather than literally.

Conclusion

Lingvanex is OK, and its low latency is impressive, but its idiomaticity is too low for me to replace DeepL with it yet.

You might also be interested in my testing of Quasar Alpha, a mysterious new model on openrouter. It performs impressively well.