Business Motivation: Video-to-Video Comparison for Compliance-Driven Content Production

Objective

To accelerate content production and improve consistency and compliance by introducing an automated, segment-aware video comparison feature that removes manual effort, improves feedback precision, and ensures repeatable validation of creative changes.

Business Context

In regulated and brand-sensitive environments—such as retail marketing, pharmaceuticals, or legal-compliant advertising—teams routinely manage multiple versions of video content. Verifying what has changed between versions is often a manual, time-consuming, and error-prone process, typically involving side-by-side playback and human interpretation.

With compliance teams under pressure to ensure that no unauthorised changes have occurred—particularly with claims, visual assets, or timing constraints—manual checks become a bottleneck, delaying approvals and increasing the risk of missed errors.

Motivation and Benefits

1. Remove Manual Comparison Tasks

Eliminate the need for frame-by-frame human review.
Automate detection of changes such as reordering, insertions, deletions, or modifications at the segment level.
Enable users to trust the tool to highlight only the relevant differences, saving hours per review cycle.

2. Provide Precision Feedback

Deliver exact timestamps, similarity scores, and change classifications (e.g. Added, Removed, Modified).
Allow reviewers to comment or tag only on affected segments instead of reviewing entire assets.
Generate auditable and exportable reports for cross-functional validation and compliance logs.

3. Ensure Versioning Consistency

Reinforce proper version control across media assets.
Prevent subtle or unapproved changes from being overlooked by enabling deterministic comparisons.
Improve traceability of changes and maintain a single source of truth for asset evolution.

4. Accelerate Compliance Workflows

Shorten approval timelines for localisations, adaptations, and re-edits.
Empower legal and brand teams to confidently approve content without requiring full creative reviews.
Integrate results directly into project management and digital asset management systems for seamless traceability.

Strategic Impact

This feature directly supports content scalability and compliance assurance, while enabling the broader organisation to:

Reduce risk of non-compliant or off-brand releases.
Improve speed-to-market for campaigns with high volumes of variants.
Build a foundation for further automation in creative QA and brand governance.

MVP Scenarios

Side-by-Side Playback (With Locking)

The comparison tool must support side-by-side playback of the old and new video versions, with frame-synchronised locking. This is required for both visual review and debugging of detected segment changes. The UI must highlight current segment type during playback (e.g. Common, Modified, Added, Removed, Moved).

Tabular Difference View

All detected segments should be shown in a table view. For each segment, display:

Segment Type: Common, Modified, Added, Removed, Moved
Timestamps in old and new videos (TemporalAnchor)
Duration
Similarity score (if applicable)
Hypothesis (for Modified or Added segments)
Navigation link to jump to the segment in the video player

Segment Types (Implemented as)

CommonSegment: Identical content in both videos
ModifiedSegment: Slight variation between matched content (e.g. edit, lighting, object added)
AddedSegment: New content appears only in the new video
RemovedSegment: Content appears only in the old video
MovedSegment: Same content exists but appears in a different position

Use Case 1: Complete Matching Video

Description: The videos are identical. Segments:

1 CommonSegment: 100% similarity, spans 0–10s Outcome: Match confirmed, no additional insight needed.

Use Case 2: Partial Fine-Grained Modification

Description: Half the video is unchanged, the other half slightly altered. Segments:

1 CommonSegment: 0–10s, similarity 100%
1 ModifiedSegment: 11–20s, similarity \~95%, e.g. colour change or object shift Outcome: Indicates visual consistency with a specific minor change.

Use Case 3: Segment Replaced by New

Description: A segment in the middle has been removed and a new one inserted. Segments:

1 CommonSegment: 0–10s, similarity 100%
1 RemovedSegment: 11–20s (from old video)
1 AddedSegment: 11–20s (in new video) Outcome: Clearly shows content substitution.

Use Case 4: Inserted Segment Leads to Longer Video

Description: A new segment was added in the middle; total duration is longer. Segments:

1 CommonSegment: 0–10s
1 AddedSegment: 11–14s (only in new video)
1 CommonSegment: 15–25s Outcome: Timeline misalignment due to new content.

Use Case 5: Middle Section Changed, Duration Unchanged

Description: One segment replaced by another of the same length. Segments:

1 CommonSegment: 0–8s
1 RemovedSegment: 9–12s (old only)
1 AddedSegment: 9–12s (new only)
1 CommonSegment: 13–20s Outcome: Content changed but video remains same length.

Use Case 6: Reordered Content

Description: Same segments appear but in different order. Length unchanged. Segments:

1 MovedSegment: Old 0–8s → New 9–16s
1 MovedSegment: Old 9–16s → New 0–8s Outcome: Reordering is recognised via diagonal segment match patterns.

Technical Differences

Definition

Technical differences are metadata-level mismatches that do not involve content, e.g.:

Duration
Resolution
Framerate
Codec
Audio encoding
Colour profile

Representation

Stored as TechDifference objects with:

type
old_value
new_value
optional description

Example

{
  "type": "resolution",
  "old_value": "1920x1080",
  "new_value": "1280x720"
}

Audio Differences

Voiceover / Dialogue Changes

Detected as speech transcription mismatch
Shown as side-by-side transcript + metadata (speaker, language, tone)

Background Music Differences

Detected via fingerprint mismatch or waveform analysis
Can be visualised as audio timeline diff
Marked as TechDifference if relevant

Frame-Aligned Audio Diff

Differences in tone, pitch, or volume per segment
May contribute to segment classification (e.g. ModifiedSegment)

Let me know if you want this turned into a formal design doc or structured test spec.

Process View

Data Model Views

Domain Model

520. plantuml.com | 520: Web server is returning an unknown error

Web server is returning an unknown error Error code 520

Visit cloudflare.com for more information.

2025-08-13 11:03:58 UTC

You

Browser

Working

London

Cloudflare

Working

www.plantuml.com

Host

Error

What happened?

There is an unknown connection issue between Cloudflare and the origin web server. As a result, the web page can not be displayed.

What can I do?

If you are a visitor of this website:

Please try again in a few minutes.

If you are the owner of this website:

There is an issue between Cloudflare's cache and your origin web server. Cloudflare monitors for these errors and automatically investigates the cause. To help support the investigation, you can pull the corresponding error log from your web server and submit it our support team. Please include the Ray ID (which is at the bottom of this error page). Additional troubleshooting resources.

Cloudflare Ray ID: 96e7ba8a39a571a4 • Your IP: 80.229.6.120 • Performance & security by Cloudflare

Pseudo Code Implementation


from typing import List, Optional, Union, Literal
from uuid import UUID
from pydantic import BaseModel, Field

# --- Pseudo Code to Capture the Design Session ---
# -- Domain Primitives --

class Video(BaseModel):
    """
    Represents a video asset in the system.
    """
    id: str  # unique identifier, could be path or database ID
    duration_seconds: float

# TODO: Reference from module, also use Union not instance
class TemporalAnchor(BaseModel):
    """
    Represents a temporal anchor within a media object.

    Can be based on time (seconds), frame number, or character offset.
    """
    type: Literal["time", "frame", "character"] = "time"
    start: float = Field(..., description="Start time in seconds, frame number or character offset")
    end: float = Field(..., description="End time in seconds, frame number or character offset")
    duration: Optional[float] = Field(default=None, description="Duration in seconds, frame count or character count")


# -- Segment Match Types --

class CommonSegment(BaseModel):
    """
    A segment that appears unchanged in both videos.

    Indicates 100% similarity for the given temporal anchor ranges.
    """
    old_range: TemporalAnchor
    new_range: TemporalAnchor
    similarity: float = Field(1.0, const=True)


class ModifiedSegment(BaseModel):
    """
    A segment that exists in both videos but has been slightly altered.

    Example: lighting change, subtitle variation, content trimmed.
    """
    old_range: TemporalAnchor
    new_range: TemporalAnchor
    similarity: float  # < 1.0 and > threshold


class AddedSegment(BaseModel):
    """
    A new segment that is present only in the new video.
    """
    new_range: TemporalAnchor
    hypothesis: Optional[str] = None  # human-interpretable guess


class RemovedSegment(BaseModel):
    """
    A segment that was removed from the old video.
    """
    old_range: TemporalAnchor
    hypothesis: Optional[str] = None


class MovedSegment(BaseModel):
    """
    A segment that is present in both videos but reordered.

    Useful for detecting structural edits (e.g., reordering intro).
    """
    old_range: TemporalAnchor
    new_range: TemporalAnchor
    similarity: float


# -- Segment Union Type --

Segment = Union[
    CommonSegment,
    ModifiedSegment,
    AddedSegment,
    RemovedSegment,
    MovedSegment,
]


# -- Technical Differences --

class TechDifference(BaseModel):
    """
    Captures technical-level differences between the videos
    (e.g., resolution, duration, encoding).

    Does not involve semantic or visual content diff.
    """
    type: Literal['duration', 'resolution', 'frame_rate', 'codec', 'audio']
    old_value: str
    new_value: str
    description: Optional[str] = None


# -- Comparison Result -- These are effectively DTOs to move the results across the wire

class ComparisonResult(BaseModel):
    """
    The outcome of comparing two video assets.

    Includes similarity score, semantic segment differences, and technical differences.
    """
    oldVideoId: str # refer to the object by its full reference Asset::Video
    newVideoId: str # refer to the object by its full reference Asset::Video
    similarityScore: float # quant summary of findings
    segments: List[Segment]
    technicalDiffs: List[TechDifference]
    hypothesis: Optional[str] = None  # human-interpretable guess


# -- Comparison Session --

class ComparisonSession(BaseModel):
    """
    A single instance of comparing two video assets.

    Holds input metadata and the result.
    """
    id: UUID
    oldVideo: Video
    newVideo: Video
    comparisonResult: ComparisonResult

Data Persistence View

base schema design for the video_comparison_submissions table:

Persisting JSONB payloads (DTOs) using the Entity-Attribute-Value (EAV) model
Supporting both automatic (version n vs n-1) and manual submission modes
Tracking submission and completion times
Recording status and results
Supporting asynchronous processing (via event queue and sidekick processor)

Table: `video_comparison_submissions`

Column Name	Type	Description
`id`	`UUID` (PK)	Primary key – unique ID for the comparison job
`card_id`	`UUID`	ID of the card to which the assets belong
`submitted_by`	`UUID` (FK)	User who submitted the job (system for automatic)
`submission_type`	`TEXT`	`auto` (version n vs n-1) or `manual` (user-selected assets)
`old_video_id`	`TEXT`	Asset ID of the "before" video
`new_video_id`	`TEXT`	Asset ID of the "after" video
`status`	`TEXT`	One of: `pending`, `processing`, `complete`, `error`
`submitted_at`	`TIMESTAMP`	Time of submission
`started_at`	`TIMESTAMP`	Time processing began (nullable)
`completed_at`	`TIMESTAMP`	Time processing completed (nullable)
`result_json`	`JSONB`	Full result object (ComparisonResult DTO) in EAV-style JSON
`error_message`	`TEXT`	Optional error message if processing fails
`metadata`	`JSONB`	Optional metadata (e.g. client ID, project, tags, trace ID)

Key Features

Flexibility via JSONB: Allows you to store domain model DTOs such as ComparisonResult including nested segments, technicalDiffs, etc.
EAV Ready: Since this is JSONB, it naturally supports nested structures, dynamic keys, and sparse attributes – all consistent with the EAV philosophy.
Supports Retry/Diagnostics: Via error_message and timestamps
Tracks submission origin: Through submission_type and submitted_by
Optimised for eventing: status + submitted_at are enough for triggering processing logic

Example Entry (Simplified)

{
  "oldVideoId": "asset_123",
  "newVideoId": "asset_124",
  "similarityScore": 0.94,
  "segments": [
    {
      "type": "CommonSegment",
      "old_range": {"type": "time", "start": 0, "end": 10},
      "new_range": {"type": "time", "start": 0, "end": 10},
      "similarity": 1.0
    },
    {
      "type": "ModifiedSegment",
      "old_range": {"type": "time", "start": 11, "end": 20},
      "new_range": {"type": "time", "start": 11, "end": 20},
      "similarity": 0.93
    }
  ],
  "technicalDiffs": [
    {
      "type": "duration",
      "old_value": "00:25:00",
      "new_value": "00:27:00"
    }
  ]
}

Suggested Indexes

CREATE INDEX idx_comparison_status ON video_comparison_submissions(status);
CREATE INDEX idx_comparison_card_id ON video_comparison_submissions(card_id);
CREATE INDEX idx_comparison_submitted_at ON video_comparison_submissions(submitted_at);