Skip to content

Model Catalogue

Definitions of Modification States

State Description
Fine-tuned In-House Further trained internally on task-specific data.
Externally Fine-tuned Loaded from a third-party fine-tuned checkpoint.
Trained from Scratch Entirely trained from random parameters, no pretraining.
None Directly loaded pretrained model from original creator.

Embedding Models

Text Embedding Models

Model Inputs Usage Hosting License Modifications Paper/Info Repository
all-MiniLM-L6-v2 Text General-purpose embeddings (prototyping, multiple projects) Self Hosted Apache-2.0 None HuggingFace GitHub
GIST-small-Embedding-v0 Text SetFit embeddings (RoS) Self Hosted MIT None HuggingFace Paper
text-embedding-3-large Text High-quality embeddings (Sidekick) OpenAI API Proprietary (OpenAI) None OpenAI N/A
e5-base-v2 Text Embeddings for SetFit classification (RoS) Self Hosted MIT Fine-tuned In-House (RoS) Paper HuggingFace

Visual and Multimodal Embeddings

Model Inputs Usage Hosting License Modifications Paper/Info Repository
Swin Image Visual embeddings for MM/VCD Self Hosted MIT Externally Fine-tuned Paper GitHub
ViT Image Visual embeddings for MM/VCD and MM/VFR Self Hosted Apache-2.0 Externally Fine-tuned (VCD), Fine-tuned In-House (VFR) Paper GitHub
CLIP-ViT Image/Text Zero-shot image-text matching, (MM/semantic delta) Self Hosted MIT None Paper HuggingFace
CLAP Audio + Text Audio embeddings (use to be confirmed) (MM/semantic delta) Self Hosted MIT None CLAP Overview GitHub

Image Analysis Models

Model Inputs Usage Hosting License Modifications Paper/Info Repository
Florence-2 Image/Text Image captioning, object detection (MM) Self Hosted MIT None Paper HuggingFace
GroundingDINO Image/Text queries Open-set object detection (MM) Self Hosted Apache-2.0 None Paper GitHub
SAM 2 Image + Prompts (points/bbox) Image segmentation (MM + MM/demos) Self Hosted Apache-2.0 None Paper GitHub
HRNet (hrnet1) Image Segment alignment for MM/VCD Self Hosted Apache-2.0 Externally Fine-tuned Paper GitHub
DETR Image Object detection for brand compliance (MM) Self Hosted Apache-2.0 Fine-tuned In-House Paper GitHub
YOLO Models Image Real-time object detection (e.g. Face detection in MM/demos) Self Hosted GPL-3.0/AGPL-3.0 None/Externally Fine-tuned Overview GitHub
RAM Image Zero-shot image tagging (MM/demos) Self Hosted Apache-2.0 None Overview GitHub
GPT-4o Image/Text Image QA (MM) OpenAI API Proprietary None OpenAI N/A

Text Analysis Models

Model Usage Hosting License Modifications Paper/Info Repository
RoBERTa Classification, paraphrase detection (MM), tested for text QA for semantic extractors (RoS). Self Hosted MIT None Paper GitHub, SQuAD
SetFit Few-shot classification (MM, RoS) Self Hosted Apache-2.0 Fine-tuned In-House (RoS/MM) Overview GitHub
Route0x Intent classification, routing Self Hosted MIT None Overview GitHub
BERTopic Topic Modelling (CNS) Self Hosted MIT Fine-tuned In-House (CNS) Paper GitHub

Audio Analysis Models

Model Name Type Used for Hosting License Modifications Paper/Info Link Repo Link
Whisper Audio Speech-to-text transcription (MM + CNS) Self Hosted MIT None Whisper Paper GitHub
Pyannote Audio + Metadata Speaker diarisation (MM/demos) Self Hosted MIT None Pyannote Overview GitHub

Document Ingestion Tasks

Model Name Type Used for Hosting License Modifications Paper/Info Link Repo Link
Azure Doc Intelligence Doc OCR (MM) Document parsing and layout analysis Azure Cloud Service Proprietary (MS) None Azure Overview N/A
Docling Document Parsing (MM) Document parsing and layout analysis Self Hosted MIT None Docling Overview GitHub

LLM APIs and Models

Model Usage Hosting License Paper/Info Repository
GPT-4o SPARQL generation, structured outputs (Sidekick, MM) OpenAI API/Azure Proprietary (OpenAI) OpenAI N/A
GPT-4o-mini Summarisation, SPARQL queries (Sidekick, MM) OpenAI API/Azure Proprietary (OpenAI) N/A N/A
GPT-4.1 Structured outputs (MM) OpenAI API/Azure Proprietary (OpenAI) N/A N/A
Ollama/llama3.1 SPARQL queries (Sidekick, MM) Self Hosted Proprietary (OpenAI) Paper GitHub