Video‑to‑Video Comparison – Technical Specification (Card‑based)
The Video‑to‑Video comparison feature detects, describes, and localises differences between two two versions of the same video, while preserving existing review context where possible. This should be considered the specific case of the more general Card to Card comparison, and likely stands as the most difficult of the Specialisms to compare.
Process overview
- Universal Asset Processing: generates modality‑specific embeddings (frames, audio windows) for each video.
- StructuralChangeDetector: performs a coarse check—if no change is detected, the new video is either; rejected as being a duplicate, or all prior
AnchorableEntitiesandCommentsare copied to the new video. - CorrespondenceMapper: aligns identical / re‑ordered segments of the video.
- FineChangeDetector: isolates regions within segments which have changed structurally. Similar in concept to the
StructuralChangeDetector. - SemanticDeltaService: produces a concise natural‑language description for each differing segment.
- LocaliseSemanticDeltaService (optional): localises the delta within the region identified as changed.
- Comment & Anchor Migration: clones anchors in unchanged regions (and possibly marks those intersecting changes as
stale), preserving the full comment history.
0 Glossary
| Term | Meaning |
|---|---|
| Universal Asset Processing | A service that extracts multi‑modal embeddings (frames, audio windows, subtitles …). |
| AnchorableEntity | A SpatialEntity, TemporalEntity or SpatioTemporalEntity anchor to which Comment threads attach. |
| Specialism | The polymorphic payload stored on a Card that owns one or more media objects (e.g., a Video specialism). |
| Segment | A contiguous region of the video. Note that this is distinct from scenes or shots. |
1 Processing Pipeline
| # | Phase (Service) | Purpose | Built | Inputs | Outputs | Early‑Exit / Side‑Effects |
|---|---|---|---|---|---|---|
| 0 | Universal Asset Processing | Generate embeddings for every modality present in the asset. | Yes – see Takeda‑POC + mm-video-to-video |
New version (v₁) of an existing video asset (v₀). |
FrameEmbedding[], AudioEmbedding[], … |
— |
| 1 | StructuralChangeDetector | Decide whether any change exists by, for example, inspecting the diagonal of the cosine‑similarity matrix built from embeddings for each modality. | No | modal embeddings from v₀ and v₁ |
StructuralChangeResult |
If has_change = False → either reject asset or clone all AnchorableEntities & Comments to new version. |
| 2 | CorrespondenceMapper | Align unchanged / re‑ordered chunks (HNET over denoised matrix). | Yes – POC see: mm-video-to-video |
modal embeddings from v₀ and v₁ |
CorrespondenceMap |
If map empty → treat c₁ as a new Card (no migration). |
| 3 | FineChangeDetector | Pin‑point differing sub‑regions inside each alignment. | Yes – POC see: mm-video-to-video |
CorrespondenceMap |
ChangeSegment[] |
If list empty → only order changed; copy Comments wholesale. |
| 4 | SemanticDeltaService | Produce natural‑language explanations for every ChangeSegment. |
Yes – POC see: mm-video-to-video |
ChangeSegment[] |
SemanticDelta[] |
— |
| 5 | LocaliseSemanticDeltaService (optional) | Localise the delta on c₁ (box or span). | Yes | SemanticDelta[], ChangeSegment[] |
new / updated AnchorableEntity[] |
— |
| 6 | Comment & Anchor Migration | Copy or invalidate existing anchors & Comments in line with the mapping and change segments. | No | CorrespondenceMap, ChangeSegment[], existing anchors/comments |
migrated / stale AnchorableEntity[], updated Comment[] |
— |
2 Phase‑Flow Diagram
Web server is returning an unknown error Error code 520
What happened?
There is an unknown connection issue between Cloudflare and the origin web server. As a result, the web page can not be displayed.
What can I do?
If you are a visitor of this website:
Please try again in a few minutes.
If you are the owner of this website:
There is an issue between Cloudflare's cache and your origin web server. Cloudflare monitors for these errors and automatically investigates the cause. To help support the investigation, you can pull the corresponding error log from your web server and submit it our support team. Please include the Ray ID (which is at the bottom of this error page). Additional troubleshooting resources.
3 Embeddings
from enum import Enum
from typing import List, Optional
from uuid import UUID
from pydantic import BaseModel, Field
# ---------- Shared enum ----------
class Modality(str, Enum):
VIDEO_FRAME = "video_frame"
AUDIO = "audio"
SUBTITLE = "subtitle"
# ---------- Base embedding ----------
class BaseEmbedding(BaseModel):
"""
A single embedding vector, representing a segment of a specific modality.
"""
card_id: UUID # ID of the Card this embedding belongs to
modality: Modality
vector: List[float]
# ---------- Modality‑specific wrappers ----------
# Note: additional fields here should be used to localise the embedding within the asset
class FrameEmbedding(BaseEmbedding):
modality: Modality = Field(default=Modality.VIDEO_FRAME, const=True)
frame_number: Optional[int] = None # localisation information
class AudioEmbedding(BaseEmbedding):
# Details here are unclear, given that the segmentation strategy is undefined. See https://huggingface.co/papers/2506.10274 as a starter
modality: Modality = Field(default=Modality.AUDIO, const=True)
sample_rate: Optional[int] = None
channel_count: Optional[int] = None
4 Video‑to‑Video Artefacts
from enum import Enum
from typing import List, Tuple
from uuid import UUID, uuid4
from pydantic import BaseModel, Field
# ---------- Enum for chunk relations ----------
class RelationType(str, Enum):
IDENTICAL = "identical"
REORDERED = "reordered"
INSERTED = "inserted"
DELETED = "deleted"
PARTIAL = "partial"
# ---------- Phase‑1 output ----------
class StructuralChangeResult(BaseModel):
card_base: UUID
card_new: UUID
modality: Modality
has_change: bool
# ---------- Phase‑2 output ----------
class MapItem(BaseModel):
base_chunk_id: UUID
new_chunk_id: UUID
relation_type: RelationType
base_offsets: Tuple[float, float] # (start, end) sec in video 0
new_offsets: Tuple[float, float] # (start, end) sec in video 1
class CorrespondenceMap(BaseModel):
map_id: UUID = Field(default_factory=uuid4)
base_card_id: UUID
new_card_id: UUID
items: List[MapItem] = Field(default_factory=list)
# ---------- Phase‑3 output ----------
class ChangeSegment(BaseModel):
segment_id: UUID = Field(default_factory=uuid4)
modality: Modality
map_item_id: UUID
base_offsets: Tuple[float, float]
new_offsets: Tuple[float, float]
similarity_score: float # 0 → big change, 1 → identical
# ---------- Phase‑4 output ----------
class SemanticDelta(BaseModel):
delta_id: UUID = Field(default_factory=uuid4)
change_segment_id: UUID
description: str
confidence: float # 0‑1
5 Service‑to‑Service Contract
| Service | Reads | Writes |
|---|---|---|
| Universal Asset Processing | Video | FrameEmbedding, AudioEmbedding, … |
| StructuralChangeDetector | embeddings | StructuralChangeResult |
| CorrespondenceMapper | embeddings, StructuralChangeResult |
CorrespondenceMap |
| FineChangeDetector | CorrespondenceMap |
ChangeSegment[] |
| SemanticDeltaService | ChangeSegment[] |
SemanticDelta[] |
| AnchoringService | SemanticDelta[], ChangeSegment[] |
new / updated AnchorableEntity |
| Comment & Anchor Migration | existing anchors/comments, CorrespondenceMap, ChangeSegment[] |
migrated / stale AnchorableEntity, updated Comment |
6 Comment & Anchor Migration — Behavioural Summary
- Unchanged regions
For everyMapItemlabelled IDENTICAL or REORDERED with no overlappingChangeSegment: -
Clone each
AnchorableEntity(and itsCommentthread) from c₀ to the corresponding offsets in c₁. -
Changed regions (open for debate)
If an anchor intersects at least oneChangeSegmentwe could: - Copy the anchor, mark it
stale = True, keep comments intact. - UI can prompt reviewers to adjust or delete these stale anchors.
or - We could delete the anchors and re‑process the part(s)/whole asset.
Questions
- How do we approach pre-processing the asset, do we only process segments which have not been seen before or that have changed? Or do we re-process the entire asset for compliance again?