Skip to content

RSV file format (technical)

Sony .rsv layout: GOP blocks, rtmd, codecs, and how recovery maps data into MP4.

RSV files are incomplete or corrupted video files created by Sony cameras (e.g., Sony FX3) when recording is interrupted unexpectedly (power loss, card removal, etc.). The .rsv file extension denotes a reserved, backup, or temporary file that contains raw video data but lacks the proper container structure (like MP4 or MXF) needed for playback. Despite being incomplete, RSV files contain valid video and audio data; they simply lack the proper MP4 container structure (moov atom) needed for playback.

Unlike standard MP4 files which use interleaved chunks, RSV files use a GOP-based block structure where all components of a GOP (Group of Pictures) are stored contiguously.

[GOP 0][GOP 1][GOP 2]...[GOP N][incomplete data]

Each GOP contains three sections in order:

  1. rtmd block - Timed metadata (variable packet count, typically 12 packets)
  2. Video essence - H.264/AVC or H.265/HEVC video frames (frame count detected from first GOP, assumed constant)
  3. Audio essence - PCM audio samples (size calculated once from first GOP’s frame count, reused for all GOPs)
┌─────────────────────────────────────────────────────────────┐
│ GOP N │
├─────────────────┬──────────────────────┬───────────────────-┤
│ rtmd Block │ Video Frames │ Audio Chunk │
│ Variable size │ Variable size │ Variable size │
│ (N × packet) │ (variable count) │ (calculated) │
└─────────────────┴──────────────────────┴───────────────────-┘

Observed variations:

SettingTypicalObserved Range
rtmd packet size19,456 bytes11,264 - 29,696 bytes
rtmd packets per GOP1212 - 48
Frames per GOP1212 - 48
Frame rate25 fps23.98, 25, 50 fps
GOP data size~6 MB6 - 27 MB

Example (12-frame GOP at 25fps):

  • rtmd block: ~233,472 bytes (12 packets × 19,456 bytes)
  • Video frames: ~5-6 MB (12 frames, variable size)
  • Audio chunk: ~92,160 bytes (stereo 16-bit) or ~276,480 bytes (4-channel 24-bit)
  • Total size: Variable (depends on packet count and packet size)
  • Packet count: Typically matches frames per GOP (observed: 12, 24, 48)
  • Packet size: Auto-detected from file (observed: 11,264, 19,456, 29,696 bytes)

Each rtmd packet has a 12-byte header structure:

Offset: 0 1 2 3 4 5 6 7 8 9 10 11
Bytes: 00 1c 01 00 ?? ?? ?? ?? f0 01 00 10
├──────────┤ ├────────┤ ├─────────┤
Prefix Variable Sony tag
(constant) metadata (constant)
OffsetBytesDescription
0-300 1c 01 00Constant prefix (identifies rtmd packet type)
4-7variableCamera metadata (timecode, frame counter, etc.)
8-11f0 01 00 10Sony-specific tag (constant across all packets)

Important: The 4-byte prefix 00 1c 01 00 can appear randomly in video/audio data as false positives. To reliably identify real rtmd packets, verify both:

  1. Bytes 0-3 match 00 1c 01 00 (prefix)
  2. Bytes 8-11 match f0 01 00 10 (Sony tag)

Observed header variants (bytes 4-7 vary):

001c0100 230328e1 f0010010... (most common)
001c0100 22f728ed f0010010... (variant)
001c0100 230f28d5 f0010010... (variant)

The rtmd data contains camera metadata such as:

  • Timecode information
  • Recording parameters
  • Camera settings
  • GPS data (if enabled)
  • H.264/AVC (codec name: avc1) - Most common
  • H.265/HEVC (codec name: hvc1) - Also supported
  • Codec: H.264/AVC High Profile or H.265/HEVC
  • NAL unit format: Length-prefixed (4-byte big-endian length + NAL data)
  • Frames per GOP: Auto-detected from the first GOP (typically 12), assumed constant throughout file

H.264/AVC frames begin with an Access Unit Delimiter (AUD) NAL:

00 00 00 02 09 XX

Where:

  • 00 00 00 02 = NAL length (2 bytes)
  • 09 = NAL type 9 (AUD)
  • XX = AUD payload (typically 10 or 30)

H.265/HEVC frames begin with an Access Unit Delimiter (AUD) NAL:

00 00 00 03 46 01 XX

Where:

  • 00 00 00 03 = NAL length (3 bytes)
  • 46 = First byte of HEVC NAL unit (NAL type 35 = AUD)
  • 01 XX = AUD payload
TypeNameDescription
9AUDAccess Unit Delimiter (frame start marker)
6SEISupplemental Enhancement Information
5IDRInstantaneous Decoder Refresh (keyframe slice)
1Non-IDRRegular slice (P/B frame)

A typical frame contains:

  1. AUD NAL (2 bytes payload)
  2. SEI NAL(s) (metadata)
  3. Slice NAL(s) (actual video data)

Keyframe (IDR) example:

[AUD len=2][SEI len=19][SEI len=26][SEI len=29][SEI len=14][SEI len=5][IDR slice ×8]

Inter-frame (P/B) example:

[AUD len=2][SEI len=14][Non-IDR slice ×9]

The compressed slice data within NAL units can contain any byte pattern, including patterns that look like NAL length prefixes. Do NOT attempt to parse NAL units by reading length fields through slice data. Instead, search for AUD patterns to find frame boundaries.

The implementation auto-detects frames_per_gop by counting AUD patterns in the first GOP (typically 12 frames). It then assumes this frame count is consistent throughout the entire file.

Implementation behavior:

  • frames_per_gop is detected from the first GOP only
  • audio_chunk_size is calculated once from this detected value and reused for all GOPs
  • Frames are counted individually per GOP (would detect if sizes varied, but doesn’t affect audio chunk calculation)
  • The implementation assumes GOP sizes are constant - if they varied, audio chunk size calculation would be incorrect

Note: Analysis of one 50GB RSV file (8,655 GOPs) showed all GOPs had exactly 12 frames, but the implementation makes no guarantee about consistency. If GOP sizes varied, the code would need to calculate audio_chunk_size per-GOP based on the actual frame count.

FormatCodecBitsChannelsBytes/Sample
Stereo 16-bittwos/sowt1624
Multi-track 24-bitipcm244×1 (mono)3 per track

Stereo 16-bit (twos/sowt) - Most common:

  • Sample rate: 48,000 Hz
  • Channels: 2 (stereo)
  • Bits per sample: 16
  • Bytes per sample: 4

Multi-track 24-bit (ipcm) - Professional cameras:

  • Sample rate: 48,000 Hz
  • Tracks: 4 separate mono tracks
  • Bits per sample: 24
  • Bytes per sample per track: 3
  • Total audio per GOP: 4 × per-track size
  • Chunk size: Calculated from GOP duration and audio format
  • Samples per chunk: GOP_duration × sample_rate
  • Total audio size: samples × bytes_per_sample × num_tracks

Audio chunk size is calculated based on GOP duration:

GOP_duration_sec = frames_per_gop / fps
samples_per_chunk = GOP_duration_sec × audio_sample_rate
chunk_size_per_track = samples_per_chunk × bytes_per_sample
total_audio_size = chunk_size_per_track × num_audio_tracks

Example 1 (12 frames at 25fps, 48kHz stereo 16-bit):

GOP_duration = 12 frames ÷ 25 fps = 0.480 seconds
samples_per_chunk = 0.480s × 48000 Hz = 23,040 samples
chunk_size = 23,040 × 4 bytes = 92,160 bytes

Example 2 (12 frames at 25fps, 48kHz 4-track 24-bit):

GOP_duration = 12 frames ÷ 25 fps = 0.480 seconds
samples_per_chunk = 0.480s × 48000 Hz = 23,040 samples
chunk_size_per_track = 23,040 × 3 bytes = 69,120 bytes
total_audio_size = 69,120 × 4 tracks = 276,480 bytes

Example 3 (48 frames at 50fps, 48kHz stereo 16-bit):

GOP_duration = 48 frames ÷ 50 fps = 0.960 seconds
samples_per_chunk = 0.960s × 48000 Hz = 46,080 samples
chunk_size = 46,080 × 4 bytes = 184,320 bytes

For cameras with multiple mono audio tracks (e.g., 4-channel ipcm), the audio data in each GOP is stored sequentially by track:

[Track 1 audio][Track 2 audio][Track 3 audio][Track 4 audio]

Each track contains the same number of samples for the GOP duration.

[ftyp][moov][mdat: V₀ A₀ V₁ A₁ V₂ A₂ ...]

Video and audio chunks are interleaved within mdat.

[rtmd₀ V₀₋₁₁ A₀][rtmd₁ V₁₂₋₂₃ A₁][rtmd₂ V₂₄₋₃₅ A₂]...

Each GOP’s data is stored as a contiguous block.

RSV files can be identified by checking offset 0 for the rtmd pattern. The implementation uses a specific header pattern for detection:

Offset 0-7: 00 1c 01 00 23 03 28 e1

This matches the most common rtmd header variant. For more robust detection, verify:

Offset 0-3: 00 1c 01 00 (prefix)
Offset 8-11: f0 01 00 10 (Sony tag)

Note: Bytes 4-7 are variable and should not be used for identification. The implementation auto-detects RSV files when this pattern is found at offset 0, or can be forced with the -rsv command-line flag.

RSV files lack:

  • ftyp atom (file type)
  • moov atom (movie metadata)
  • Standard MP4 box structure

The entire file consists of raw GOP blocks.

To recover an RSV file using untrunc:

  1. Obtain reference file: Need a properly-recorded MP4 from the same camera with matching codec parameters (H.264/AVC or H.265/HEVC).

  2. Run untrunc:

    Terminal window
    untrunc reference.mp4 corrupt.RSV

    Or explicitly enable RSV mode:

    Terminal window
    untrunc -rsv reference.mp4 corrupt.RSV
  3. Auto-detection:

    • RSV files are automatically detected if they start with the rtmd pattern at offset 0
    • The implementation auto-detects rtmd_packet_size by finding the distance between the first two rtmd patterns
    • The implementation auto-detects frames_per_gop by counting AUD patterns in the first GOP
    • Audio chunk size is calculated once from the first GOP’s frame count and reused for all GOPs
  4. Parse GOP structure:

    • Find rtmd packets by checking both prefix (001c0100) AND Sony tag (f0010010 at offset 8)
    • Count consecutive rtmd packets (using auto-detected packet size) to find video start
    • Extract video frames by finding AUD patterns:
      • H.264: 00 00 00 02 09 (NAL type 9)
      • HEVC: 00 00 00 03 46 (NAL type 35)
    • Audio chunk follows immediately after video frames (size calculated from GOP duration)
  5. Build MP4 container:

    • Use codec parameters from reference file
    • Create proper stbl atoms (stco, stsz, stsc, stts, stss, ctts)
    • Map frame offsets to the RSV data
  6. Handle GOP structure:

    • Count actual AUD patterns per GOP (frames are counted individually)
    • Count consecutive valid rtmd packets per GOP (rtmd count can vary)
    • Avoid false positive rtmd matches by verifying the Sony tag at offset 8
    • Note: Audio chunk size is calculated from the first GOP and reused for all GOPs. The implementation assumes GOP sizes are constant throughout the file - if they varied, this would be incorrect.

This format has been tested with:

  • Sony FX3 (ILME-FX3)
  • Sony FX30 (ILME-FX30)
  • Sony A7S III (ILCE-7SM3)
PropertyTypical ValueObserved Range
Video codecH.264/AVC or H.265/HEVCavc1, hvc1
Resolution3840×2160 (4K)-
Frame rate25 fps23.98, 25, 50 fps
Frames per GOP1212, 24, 48
Audio codecPCM 16-bit (twos/sowt)twos, sowt, ipcm (24-bit)
Audio channels2 (stereo)2 stereo or 4×1 mono
Audio rate48000 Hz-
Bitrate~100 Mbps100-250 Mbps
rtmd packet size19,456 bytes11,264 - 29,696 bytes
GOP data size~6 MB6 - 27 MB
  • ITU-T H.264 / ISO/IEC 14496-10 (AVC)
  • ISO/IEC 14496-12 (MP4 container)
  • Sony XAVC format documentation

The untrunc implementation includes several auto-detection features:

  1. RSV File Detection: Automatically detects RSV files by checking for rtmd pattern at offset 0
  2. rtmd Packet Size: Auto-detects packet size by measuring distance between first two rtmd patterns (handles 11K-30K byte packets)
  3. Frames per GOP: Auto-detects by counting AUD patterns in the first GOP (handles 12-48 frames)
  4. Audio Track Discovery: Finds all audio tracks (twos, sowt, ipcm) and calculates total audio size
  5. Multi-Track Audio: Supports cameras with multiple mono tracks (e.g., 4×24-bit ipcm)
  6. Codec Support: Supports both H.264/AVC (avc1) and H.265/HEVC (hvc1) codecs

The implementation uses a 128MB buffer for GOP parsing to handle high-bitrate recordings:

  • 25fps H.264 4K: ~6 MB per GOP
  • 50fps HEVC 4K: ~25 MB per GOP
Terminal window
# Auto-detect RSV file
untrunc reference.mp4 corrupt.RSV
# Force RSV mode
untrunc -rsv reference.mp4 corrupt.RSV
# For truncated reference files, use -dcc to skip chunk validation
untrunc -dcc -rsv reference.mp4 corrupt.RSV

The -rsv flag forces RSV recovery mode even if auto-detection fails.

ConfigurationVideoAudioStatus
H.264 25fps + stereo 16-bitavc1twos
H.264 25fps + 4×mono 24-bitavc1ipcm
HEVC 50fps + stereo 16-bithvc1twos
HEVC 23.98fps + stereo 16-bithvc1twos