Video‑to‑Video Comparison – Technical Specification (Card‑based)

The Video‑to‑Video comparison feature detects, describes, and localises differences between two two versions of the same video, while preserving existing review context where possible. This should be considered the specific case of the more general Card to Card comparison, and likely stands as the most difficult of the Specialisms to compare.

Process overview

Universal Asset Processing: generates modality‑specific embeddings (frames, audio windows) for each video.
StructuralChangeDetector: performs a coarse check—if no change is detected, the new video is either; rejected as being a duplicate, or all prior AnchorableEntities and Comments are copied to the new video.
CorrespondenceMapper: aligns identical / re‑ordered segments of the video.
FineChangeDetector: isolates regions within segments which have changed structurally. Similar in concept to the StructuralChangeDetector.
SemanticDeltaService: produces a concise natural‑language description for each differing segment.
LocaliseSemanticDeltaService (optional): localises the delta within the region identified as changed.
Comment & Anchor Migration: clones anchors in unchanged regions (and possibly marks those intersecting changes as stale), preserving the full comment history.

0 Glossary

Term	Meaning
Universal Asset Processing	A service that extracts multi‑modal embeddings (frames, audio windows, subtitles …).
AnchorableEntity	A `SpatialEntity`, `TemporalEntity` or `SpatioTemporalEntity` anchor to which `Comment` threads attach.
Specialism	The polymorphic payload stored on a Card that owns one or more media objects (e.g., a `Video` specialism).
Segment	A contiguous region of the video. Note that this is distinct from scenes or shots.

1 Processing Pipeline

#	Phase (Service)	Purpose	Built	Inputs	Outputs	Early‑Exit / Side‑Effects
0	Universal Asset Processing	Generate embeddings for every modality present in the asset.	Yes – see Takeda‑POC + mm-video-to-video	New version (`v₁`) of an existing video asset (`v₀`).	`FrameEmbedding[]`, `AudioEmbedding[]`, …	—
1	StructuralChangeDetector	Decide whether any change exists by, for example, inspecting the diagonal of the cosine‑similarity matrix built from embeddings for each modality.	No	modal embeddings from `v₀` and `v₁`	`StructuralChangeResult`	If `has_change = False` → either reject asset or clone all AnchorableEntities & Comments to new version.
2	CorrespondenceMapper	Align unchanged / re‑ordered chunks (HNET over denoised matrix).	Yes – POC see: mm-video-to-video	modal embeddings from `v₀` and `v₁`	`CorrespondenceMap`	If map empty → treat c₁ as a new Card (no migration).
3	FineChangeDetector	Pin‑point differing sub‑regions inside each alignment.	Yes – POC see: mm-video-to-video	`CorrespondenceMap`	`ChangeSegment[]`	If list empty → only order changed; copy Comments wholesale.
4	SemanticDeltaService	Produce natural‑language explanations for every `ChangeSegment`.	Yes – POC see: mm-video-to-video	`ChangeSegment[]`	`SemanticDelta[]`	—
5	LocaliseSemanticDeltaService (optional)	Localise the delta on c₁ (box or span).	Yes	`SemanticDelta[]`, `ChangeSegment[]`	new / updated `AnchorableEntity[]`	—
6	Comment & Anchor Migration	Copy or invalidate existing anchors & Comments in line with the mapping and change segments.	No	`CorrespondenceMap`, `ChangeSegment[]`, existing anchors/comments	migrated / stale `AnchorableEntity[]`, updated `Comment[]`	—

2 Phase‑Flow Diagram

520. plantuml.com | 520: Web server is returning an unknown error

Web server is returning an unknown error Error code 520

Visit cloudflare.com for more information.

2025-08-13 11:03:58 UTC

You

Browser

Working

London

Cloudflare

Working

www.plantuml.com

Host

Error

What happened?

There is an unknown connection issue between Cloudflare and the origin web server. As a result, the web page can not be displayed.

What can I do?

If you are a visitor of this website:

Please try again in a few minutes.

If you are the owner of this website:

There is an issue between Cloudflare's cache and your origin web server. Cloudflare monitors for these errors and automatically investigates the cause. To help support the investigation, you can pull the corresponding error log from your web server and submit it our support team. Please include the Ray ID (which is at the bottom of this error page). Additional troubleshooting resources.

Cloudflare Ray ID: 96e7ba8a3fae4595 • Your IP: 80.229.6.120 • Performance & security by Cloudflare

3 Embeddings

from enum import Enum
from typing import List, Optional
from uuid import UUID
from pydantic import BaseModel, Field

# ----------  Shared enum ----------

class Modality(str, Enum):
    VIDEO_FRAME = "video_frame"
    AUDIO       = "audio"
    SUBTITLE    = "subtitle"

# ----------  Base embedding ----------

class BaseEmbedding(BaseModel):
    """
    A single embedding vector, representing a segment of a specific modality.
    """
    card_id: UUID            # ID of the Card this embedding belongs to
    modality: Modality
    vector:   List[float]

# ----------  Modality‑specific wrappers ----------
# Note: additional fields here should be used to localise the embedding within the asset

class FrameEmbedding(BaseEmbedding):
    modality:     Modality = Field(default=Modality.VIDEO_FRAME, const=True)
    frame_number: Optional[int] = None      # localisation information

class AudioEmbedding(BaseEmbedding):
    # Details here are unclear, given that the segmentation strategy is undefined. See https://huggingface.co/papers/2506.10274 as a starter
    modality:      Modality = Field(default=Modality.AUDIO, const=True)
    sample_rate:   Optional[int] = None
    channel_count: Optional[int] = None

4 Video‑to‑Video Artefacts

from enum import Enum
from typing import List, Tuple
from uuid import UUID, uuid4
from pydantic import BaseModel, Field

# ----------  Enum for chunk relations ----------

class RelationType(str, Enum):
    IDENTICAL = "identical"
    REORDERED = "reordered"
    INSERTED  = "inserted"
    DELETED   = "deleted"
    PARTIAL   = "partial"

# ----------  Phase‑1 output ----------

class StructuralChangeResult(BaseModel):
    card_base: UUID
    card_new:  UUID
    modality:  Modality
    has_change: bool

# ----------  Phase‑2 output ----------

class MapItem(BaseModel):
    base_chunk_id: UUID
    new_chunk_id:  UUID
    relation_type: RelationType
    base_offsets:  Tuple[float, float]   # (start, end) sec in video 0
    new_offsets:   Tuple[float, float]   # (start, end) sec in video 1

class CorrespondenceMap(BaseModel):
    map_id:        UUID = Field(default_factory=uuid4)
    base_card_id:  UUID
    new_card_id:   UUID
    items:         List[MapItem] = Field(default_factory=list)

# ----------  Phase‑3 output ----------

class ChangeSegment(BaseModel):
    segment_id:      UUID = Field(default_factory=uuid4)
    modality:        Modality
    map_item_id:     UUID
    base_offsets:    Tuple[float, float]
    new_offsets:     Tuple[float, float]
    similarity_score: float  # 0 → big change, 1 → identical

# ----------  Phase‑4 output ----------

class SemanticDelta(BaseModel):
    delta_id:          UUID = Field(default_factory=uuid4)
    change_segment_id: UUID
    description:       str
    confidence:        float  # 0‑1

5 Service‑to‑Service Contract

Service	Reads	Writes
Universal Asset Processing	Video	`FrameEmbedding`, `AudioEmbedding`, …
StructuralChangeDetector	embeddings	`StructuralChangeResult`
CorrespondenceMapper	embeddings, `StructuralChangeResult`	`CorrespondenceMap`
FineChangeDetector	`CorrespondenceMap`	`ChangeSegment[]`
SemanticDeltaService	`ChangeSegment[]`	`SemanticDelta[]`
AnchoringService	`SemanticDelta[]`, `ChangeSegment[]`	new / updated `AnchorableEntity`
Comment & Anchor Migration	existing anchors/comments, `CorrespondenceMap`, `ChangeSegment[]`	migrated / stale `AnchorableEntity`, updated `Comment`

6 Comment & Anchor Migration — Behavioural Summary

Unchanged regions
For every MapItem labelled IDENTICAL or REORDERED with no overlapping ChangeSegment:
Clone each AnchorableEntity (and its Comment thread) from c₀ to the corresponding offsets in c₁.
Changed regions (open for debate)
If an anchor intersects at least one ChangeSegment we could:
Copy the anchor, mark it stale = True, keep comments intact.
UI can prompt reviewers to adjust or delete these stale anchors.
or
We could delete the anchors and re‑process the part(s)/whole asset.

Questions

How do we approach pre-processing the asset, do we only process segments which have not been seen before or that have changed? Or do we re-process the entire asset for compliance again?

Video‑to‑Video Comparison – Technical Specification (Card‑based)