We are seeking a ComfyUI specialist to build a high-precision, sequential video generation workflow. The objective is to create a 15-second video that is generated in three distinct 5-second segments.
The Vision:
Unlike standard AI video generators that "guess" the motion, this workflow must allow me to provide a Start Frame and an End Frame for every 5-second block. This ensures that the video
doesn't just wander; it follows a precise path from Point A to Point B.
The aesthetic must mimic the "Zack D. Films" style: clean 3D character models, clinical/educational lighting, and smooth, snappy animations. By generating the keyframes (T0, T5, T10, T15) before the video, we ensure total character consistency and professional-grade storytelling across the full 15 seconds.
2. Core Logic & Workflow Structure
The developer must build the pipeline to follow these four specific stages:
● Stage 1: Keyframe Storyboarding
○ A module to generate 4 primary images: 0s, 5s, 10s, and 15s.
○ Must use IP-Adapter or Wan-StandIn logic to ensure the character, clothing, and environment are identical in all 4 images.
● Stage 2: Sequential Rendering (The "Sandwich" Method)
○ Segment 1 (0-5s): Uses Image 0 as the Start and Image 5 as the End.
○ Segment 2 (5-10s): Uses Image 5 as the Start and Image 10 as the End.
○ Segment 3 (10-15s): Uses Image 10 as the Start and Image 15 as the End.
● Stage 3: Seamless Transitions & Smoothing
○ Implement Color Match nodes to prevent "flicker" between segments.
○ Use VFI (Video Frame Interpolation) to bring the native 16fps output up to a "snappy" 60fps.
● Stage 4: Automated Assembly
○ Automatically stitch the three clips into a single high-bitrate .mp4 file.
3. Required Technical Stack (Models & Nodes)
Primary Video Model:
● Wan2.1 / Wan2.2 (FLF2V Version): Specifically the First-Last Frame 14B or 1.3B models. This is non-negotiable as it is the only open-source model capable of dual-image conditioning (Start and End frames).
Essential Custom Nodes:
● ComfyUI-WanVideoStartEndFrames: For the WanVideoStartEndFramesSampler.
● ComfyUI-WanVideoWrapper (Kijai): For model loading and VRAM optimization.
● ComfyUI-VideoHelperSuite (VHS): For video concatenation and saving.
● ComfyUI-KJNodes: For ColorMatch and frame interpolation.
● IP-Adapter-Plus: To lock character identity across the segments.
4. Technical Requirements & Performance
● VRAM Efficiency: The workflow should be optimized for a 24GB VRAM environment
(using FP8 or GGUF quantization where necessary).
● Zack D. Aesthetic: The workflow must include a prompt-engineering block (or LoRA
loader) pre-configured for the "3D Medical Animation / Octane Render" look.
● Modularity: Each 5-second segment should be able to be "frozen" or "muted" so I can
re-roll one segment without re-rendering the whole 15 seconds.
5. Deliverables
1. A .json or .png workflow file that is color-coded and organized into clear groups.
2. A simple "Setup Guide" listing the specific models and LoRAs to download.
3. A Test Render: A 15-second demonstration video showing a character moving through the three segments with zero identity drift.
Apply Now
Apply Now