Skip to content

Video‑to‑Video Comparison – Technical Specification (Card‑based)


The Video‑to‑Video comparison feature detects, describes, and localises differences between two two versions of the same video, while preserving existing review context where possible. This should be considered the specific case of the more general Card to Card comparison, and likely stands as the most difficult of the Specialisms to compare.


Process overview

  1. Universal Asset Processing: generates modality‑specific embeddings (frames, audio windows) for each video.
  2. StructuralChangeDetector: performs a coarse check—if no change is detected, the new video is either; rejected as being a duplicate, or all prior AnchorableEntities and Comments are copied to the new video.
  3. CorrespondenceMapper: aligns identical / re‑ordered segments of the video.
  4. FineChangeDetector: isolates regions within segments which have changed structurally. Similar in concept to the StructuralChangeDetector.
  5. SemanticDeltaService: produces a concise natural‑language description for each differing segment.
  6. LocaliseSemanticDeltaService (optional): localises the delta within the region identified as changed.
  7. Comment & Anchor Migration: clones anchors in unchanged regions (and possibly marks those intersecting changes as stale), preserving the full comment history.

0 Glossary

Term Meaning
Universal Asset Processing A service that extracts multi‑modal embeddings (frames, audio windows, subtitles …).
AnchorableEntity A SpatialEntity, TemporalEntity or SpatioTemporalEntity anchor to which Comment threads attach.
Specialism The polymorphic payload stored on a Card that owns one or more media objects (e.g., a Video specialism).
Segment A contiguous region of the video. Note that this is distinct from scenes or shots.

1 Processing Pipeline

# Phase (Service) Purpose Built Inputs Outputs Early‑Exit / Side‑Effects
0 Universal Asset Processing Generate embeddings for every modality present in the asset. Yes – see Takeda‑POC
+
mm-video-to-video
New version (v₁) of an existing video asset (v₀). FrameEmbedding[], AudioEmbedding[], …
1 StructuralChangeDetector Decide whether any change exists by, for example, inspecting the diagonal of the cosine‑similarity matrix built from embeddings for each modality. No modal embeddings from v₀ and v₁ StructuralChangeResult If has_change = False → either reject asset or clone all AnchorableEntities & Comments to new version.
2 CorrespondenceMapper Align unchanged / re‑ordered chunks (HNET over denoised matrix). Yes – POC
see: mm-video-to-video
modal embeddings from v₀ and v₁ CorrespondenceMap If map empty → treat c₁ as a new Card (no migration).
3 FineChangeDetector Pin‑point differing sub‑regions inside each alignment. Yes – POC
see: mm-video-to-video
CorrespondenceMap ChangeSegment[] If list empty → only order changed; copy Comments wholesale.
4 SemanticDeltaService Produce natural‑language explanations for every ChangeSegment. Yes – POC
see: mm-video-to-video
ChangeSegment[] SemanticDelta[]
5 LocaliseSemanticDeltaService (optional) Localise the delta on c₁ (box or span). Yes SemanticDelta[], ChangeSegment[] new / updated AnchorableEntity[]
6 Comment & Anchor Migration Copy or invalidate existing anchors & Comments in line with the mapping and change segments. No CorrespondenceMap, ChangeSegment[], existing anchors/comments migrated / stale AnchorableEntity[], updated Comment[]

2 Phase‑Flow Diagram

520. plantuml.com | 520: Web server is returning an unknown error

Web server is returning an unknown error Error code 520

Visit cloudflare.com for more information.
2025-08-13 11:03:58 UTC
You

Browser

Working
London

Cloudflare

Working
www.plantuml.com

Host

Error

What happened?

There is an unknown connection issue between Cloudflare and the origin web server. As a result, the web page can not be displayed.

What can I do?

If you are a visitor of this website:

Please try again in a few minutes.

If you are the owner of this website:

There is an issue between Cloudflare's cache and your origin web server. Cloudflare monitors for these errors and automatically investigates the cause. To help support the investigation, you can pull the corresponding error log from your web server and submit it our support team. Please include the Ray ID (which is at the bottom of this error page). Additional troubleshooting resources.

Video‑to‑Video Processing Flow (Card‑based)Video‑to‑Video Processing Flow (Card‑based)New Video version uploadedUniversal Asset Processing(Compute Embeddings)StructuralChangeDetectorClone anchors + commentsIdenticalhasChange == falseCorrespondenceMapperNo migration requiredCompletely newCorrespondence map empty?FineChangeDetectorCopy comments without changeRe‑order onlyChangeSegment list empty?SemanticDeltaServiceLocaliseSemanticDeltaService (optional)Comment & Anchor Migration

3 Embeddings

from enum import Enum
from typing import List, Optional
from uuid import UUID
from pydantic import BaseModel, Field

# ----------  Shared enum ----------

class Modality(str, Enum):
    VIDEO_FRAME = "video_frame"
    AUDIO       = "audio"
    SUBTITLE    = "subtitle"

# ----------  Base embedding ----------

class BaseEmbedding(BaseModel):
    """
    A single embedding vector, representing a segment of a specific modality.
    """
    card_id: UUID            # ID of the Card this embedding belongs to
    modality: Modality
    vector:   List[float]

# ----------  Modality‑specific wrappers ----------
# Note: additional fields here should be used to localise the embedding within the asset

class FrameEmbedding(BaseEmbedding):
    modality:     Modality = Field(default=Modality.VIDEO_FRAME, const=True)
    frame_number: Optional[int] = None      # localisation information

class AudioEmbedding(BaseEmbedding):
    # Details here are unclear, given that the segmentation strategy is undefined. See https://huggingface.co/papers/2506.10274 as a starter
    modality:      Modality = Field(default=Modality.AUDIO, const=True)
    sample_rate:   Optional[int] = None
    channel_count: Optional[int] = None

4 Video‑to‑Video Artefacts

from enum import Enum
from typing import List, Tuple
from uuid import UUID, uuid4
from pydantic import BaseModel, Field

# ----------  Enum for chunk relations ----------

class RelationType(str, Enum):
    IDENTICAL = "identical"
    REORDERED = "reordered"
    INSERTED  = "inserted"
    DELETED   = "deleted"
    PARTIAL   = "partial"

# ----------  Phase‑1 output ----------

class StructuralChangeResult(BaseModel):
    card_base: UUID
    card_new:  UUID
    modality:  Modality
    has_change: bool

# ----------  Phase‑2 output ----------

class MapItem(BaseModel):
    base_chunk_id: UUID
    new_chunk_id:  UUID
    relation_type: RelationType
    base_offsets:  Tuple[float, float]   # (start, end) sec in video 0
    new_offsets:   Tuple[float, float]   # (start, end) sec in video 1

class CorrespondenceMap(BaseModel):
    map_id:        UUID = Field(default_factory=uuid4)
    base_card_id:  UUID
    new_card_id:   UUID
    items:         List[MapItem] = Field(default_factory=list)

# ----------  Phase‑3 output ----------

class ChangeSegment(BaseModel):
    segment_id:      UUID = Field(default_factory=uuid4)
    modality:        Modality
    map_item_id:     UUID
    base_offsets:    Tuple[float, float]
    new_offsets:     Tuple[float, float]
    similarity_score: float  # 0 → big change, 1 → identical

# ----------  Phase‑4 output ----------

class SemanticDelta(BaseModel):
    delta_id:          UUID = Field(default_factory=uuid4)
    change_segment_id: UUID
    description:       str
    confidence:        float  # 0‑1

5 Service‑to‑Service Contract

Service Reads Writes
Universal Asset Processing Video FrameEmbedding, AudioEmbedding, …
StructuralChangeDetector embeddings StructuralChangeResult
CorrespondenceMapper embeddings, StructuralChangeResult CorrespondenceMap
FineChangeDetector CorrespondenceMap ChangeSegment[]
SemanticDeltaService ChangeSegment[] SemanticDelta[]
AnchoringService SemanticDelta[], ChangeSegment[] new / updated AnchorableEntity
Comment & Anchor Migration existing anchors/comments, CorrespondenceMap, ChangeSegment[] migrated / stale AnchorableEntity, updated Comment

6 Comment & Anchor Migration — Behavioural Summary

  1. Unchanged regions
    For every MapItem labelled IDENTICAL or REORDERED with no overlapping ChangeSegment:
  2. Clone each AnchorableEntity (and its Comment thread) from c₀ to the corresponding offsets in c₁.

  3. Changed regions (open for debate)
    If an anchor intersects at least one ChangeSegment we could:

  4. Copy the anchor, mark it stale = True, keep comments intact.
  5. UI can prompt reviewers to adjust or delete these stale anchors.
    or
  6. We could delete the anchors and re‑process the part(s)/whole asset.

Questions

  • How do we approach pre-processing the asset, do we only process segments which have not been seen before or that have changed? Or do we re-process the entire asset for compliance again?