Model Catalogue

Definitions of Modification States

State	Description
Fine-tuned In-House	Further trained internally on task-specific data.
Externally Fine-tuned	Loaded from a third-party fine-tuned checkpoint.
Trained from Scratch	Entirely trained from random parameters, no pretraining.
None	Directly loaded pretrained model from original creator.

Model	Inputs	Usage	Hosting	License	Modifications	Paper/Info	Repository
all-MiniLM-L6-v2	Text	General-purpose embeddings (prototyping, multiple projects)	Self Hosted	Apache-2.0	None	HuggingFace	GitHub
GIST-small-Embedding-v0	Text	SetFit embeddings (RoS)	Self Hosted	MIT	None	HuggingFace	Paper
text-embedding-3-large	Text	High-quality embeddings (Sidekick)	OpenAI API	Proprietary (OpenAI)	None	OpenAI	N/A
e5-base-v2	Text	Embeddings for SetFit classification (RoS)	Self Hosted	MIT	Fine-tuned In-House (RoS)	Paper	HuggingFace

Model	Inputs	Usage	Hosting	License	Modifications	Paper/Info	Repository
Swin	Image	Visual embeddings for MM/VCD	Self Hosted	MIT	Externally Fine-tuned	Paper	GitHub
ViT	Image	Visual embeddings for MM/VCD and MM/VFR	Self Hosted	Apache-2.0	Externally Fine-tuned (VCD), Fine-tuned In-House (VFR)	Paper	GitHub
CLIP-ViT	Image/Text	Zero-shot image-text matching, (MM/semantic delta)	Self Hosted	MIT	None	Paper	HuggingFace
CLAP	Audio + Text	Audio embeddings (use to be confirmed) (MM/semantic delta)	Self Hosted	MIT	None	CLAP Overview	GitHub

Model	Inputs	Usage	Hosting	License	Modifications	Paper/Info	Repository
Florence-2	Image/Text	Image captioning, object detection (MM)	Self Hosted	MIT	None	Paper	HuggingFace
GroundingDINO	Image/Text queries	Open-set object detection (MM)	Self Hosted	Apache-2.0	None	Paper	GitHub
SAM 2	Image + Prompts (points/bbox)	Image segmentation (MM + MM/demos)	Self Hosted	Apache-2.0	None	Paper	GitHub
HRNet (hrnet1)	Image	Segment alignment for MM/VCD	Self Hosted	Apache-2.0	Externally Fine-tuned	Paper	GitHub
DETR	Image	Object detection for brand compliance (MM)	Self Hosted	Apache-2.0	Fine-tuned In-House	Paper	GitHub
YOLO Models	Image	Real-time object detection (e.g. Face detection in MM/demos)	Self Hosted	GPL-3.0/AGPL-3.0	None/Externally Fine-tuned	Overview	GitHub
RAM	Image	Zero-shot image tagging (MM/demos)	Self Hosted	Apache-2.0	None	Overview	GitHub
GPT-4o	Image/Text	Image QA (MM)	OpenAI API	Proprietary	None	OpenAI	N/A

Model	Usage	Hosting	License	Modifications	Paper/Info	Repository
RoBERTa	Classification, paraphrase detection (MM), tested for text QA for semantic extractors (RoS).	Self Hosted	MIT	None	Paper	GitHub, SQuAD
SetFit	Few-shot classification (MM, RoS)	Self Hosted	Apache-2.0	Fine-tuned In-House (RoS/MM)	Overview	GitHub
Route0x	Intent classification, routing	Self Hosted	MIT	None	Overview	GitHub
BERTopic	Topic Modelling (CNS)	Self Hosted	MIT	Fine-tuned In-House (CNS)	Paper	GitHub

Model Name	Type	Used for	Hosting	License	Modifications	Paper/Info Link	Repo Link
Whisper	Audio	Speech-to-text transcription (MM + CNS)	Self Hosted	MIT	None	Whisper Paper	GitHub
Pyannote	Audio + Metadata	Speaker diarisation (MM/demos)	Self Hosted	MIT	None	Pyannote Overview	GitHub

Model Name	Type	Used for	Hosting	License	Modifications	Paper/Info Link	Repo Link
Azure Doc Intelligence	Doc OCR (MM)	Document parsing and layout analysis	Azure Cloud Service	Proprietary (MS)	None	Azure Overview	N/A
Docling	Document Parsing (MM)	Document parsing and layout analysis	Self Hosted	MIT	None	Docling Overview	GitHub

Model	Usage	Hosting	License	Paper/Info	Repository
GPT-4o	SPARQL generation, structured outputs (Sidekick, MM)	OpenAI API/Azure	Proprietary (OpenAI)	OpenAI	N/A
GPT-4o-mini	Summarisation, SPARQL queries (Sidekick, MM)	OpenAI API/Azure	Proprietary (OpenAI)	N/A	N/A
GPT-4.1	Structured outputs (MM)	OpenAI API/Azure	Proprietary (OpenAI)	N/A	N/A
Ollama/llama3.1	SPARQL queries (Sidekick, MM)	Self Hosted	Proprietary (OpenAI)	Paper	GitHub