Skip to content

RSV file format (technical)

Sony .rsv layout: the raw GOP stream and the true-MXF/XAVC variant, and how recovery maps them into MP4.

A .rsv is an incomplete recording a Sony camera leaves behind when capture is interrupted (power loss, card removal, crash). It holds valid video + audio essence but lacks the MP4 container (ftyp/moov) a player needs. Recovery rebuilds that container around the essence: codec, resolution, frame rate, audio layout, and start timecode are all derived from the .rsv itself — no other input is needed.

There are two completely different .rsv layouts, and the recovery tool auto-detects which one it’s looking at by the first bytes of the file:

VariantStarts withCamerasEssence framing
Raw GOP stream00 1C 01 00 … F0 01 00 10FX3, FX30, A7S IIIGOP blocks (rtmd + video + audio, contiguous)
true-MXF / XAVC-I06 0E 2B 34 (MXF Universal Label)FX6, FX9, FR7KLV packets, one edit unit (frame) at a time

The rest of this page describes each.


Written by cameras that record to an MP4-like internal layout. This variant has no container — no MXF wrapper, no ftyp/moov — just a raw, contiguous run of GOP blocks, each led by an rtmd (real-time metadata) packet that the recovery keys on. Unlike MP4’s interleaved chunks, a .rsv of this kind uses a GOP-based block structure: all parts of a Group of Pictures are stored contiguously.

[GOP 0][GOP 1][GOP 2] … [GOP N][incomplete tail]

Each GOP is three contiguous sections:

┌───────────────────┬───────────────────────┬────────────────────┐
│ rtmd block │ video essence │ audio chunk │
│ N × packet_size │ H.264/AVC or HEVC │ PCM, computed │
└───────────────────┴───────────────────────┴────────────────────┘
  1. rtmd block — timed metadata, a run of fixed-size packets (one per video frame). It also carries the codec config (SPS/PPS) in GOP 0.
  2. Video essence — H.264/AVC or HEVC access units, AVCC framed (4-byte big-endian length prefix + NAL; not Annex-B start codes).
  3. Audio chunk — PCM, one chunk per audio track, concatenated.

Observed ranges (FX3 / FX30 / A7S III, 4K)

Section titled “Observed ranges (FX3 / FX30 / A7S III, 4K)”
PropertyTypicalObserved range
rtmd packet size19,456 B11,264 – 29,696 B
rtmd packets / GOP (= frames/GOP)1212 – 48
Frame rate25 fps23.976, 24, 25, 50, 59.94
GOP data size~6 MB6 – 27 MB
Bitrate~100 Mbps100 – 250 Mbps
Resolution3840×2160

GOP-constant assumption: frames_per_gop is auto-detected from the first GOP, and the audio chunk size is computed once and reused. Frames are still counted per-GOP, but audio sizing assumes a constant GOP length (verified on a 50 GB / 8,655-GOP clip — all 12 frames). A clip with varying GOP lengths would need per-GOP audio sizing.

Each rtmd packet begins with a constant signature used as a delimiter:

Offset: 0 1 2 3 4 5 6 7 8 9 10 11
Bytes: 00 1c 01 00 ?? ?? ?? ?? f0 01 00 10
└ prefix ─┘ └ variable┘ └ Sony tag ┘
  • Bytes 0–3 00 1c 01 00 and bytes 8–11 f0 01 00 10 are constant; bytes 4–7 vary (timecode/frame counter). Verify both ranges — the 4-byte prefix alone produces false positives inside compressed essence. rtmd_packet_size is auto-detected as the distance between the first two valid headers.
  • Observed variants of bytes 4–7: 230328e1, 22f728ed, 230f28d5, …

This 12-byte signature is actually the start of a 28-byte (0x1C) Sony-private record header (the leading 00 1c is the header length) — not a standalone 12-byte header, and not an MXF System Item. The metadata block carries timecode, recording parameters, camera settings, and the codec config; it is Sony-proprietary, with no standard MXF descriptors.

  • Codecs: H.264 High / High 4:2:2 (avc1) and HEVC (hvc1). There are no in-band SPS/PPS — the codec config lives in the rtmd metadata, which the tool parses to build avcC/hvcC.
  • NAL format: AVCC (4-byte big-endian length + NAL).
  • Frame boundaries are found by the Access Unit Delimiter (AUD) NAL:
    • H.264: 00 00 00 02 09 XX (len=2, NAL type 9 = AUD)
    • HEVC: 00 00 00 03 46 01 XX (len=3, NAL type 35 = AUD)
NAL type (H.264 / HEVC)NameRole
9 / 35AUDframe-start marker
6 / 39SEIsupplemental info
5 / 19–21IDR / IRAPkeyframe slice
1 / 0–9non-IDRP/B slice

Typical composition — keyframe: [AUD][SEI…][IDR slice ×N]; inter-frame: [AUD][SEI][slice ×N].

FormatCodecBitsChannelsBytes / sample-frame
Stereo 16-bittwos / sowt1624
Stereo 24-bitin242426
4-track 24-bitin24 (4× mono)244×112 total

All at 48,000 Hz. Multi-track audio is stored sequentially by track within each GOP: [track1][track2][track3][track4]. The audio format (channels / bits / track count) is derived from the essence layout — no reference file needed.

Use exact integer-rational rounding, not floating-point seconds:

samples_per_gop = round(frames_per_gop × 48000 × dur_per_sample / timescale)
chunk_size_per_track = samples_per_gop × bytes_per_sample
total_audio_size = chunk_size_per_track × num_audio_tracks

The float form (GOP_duration_sec × rate) is off by ~1 sample/GOP for NTSC-fractional rates (23.976 = 24000/1001, 29.97, 59.94), which corrupts one frame per GOP and drifts the audio. timescale / dur_per_sample come from rtmd tag 0x8106.

Worked examples (integer rates, where float and rational agree):

  • 12 frames @ 25 fps, stereo 16-bit: 12/25 × 48000 = 23,040 samples × 4 = 92,160 B
  • 12 frames @ 25 fps, 4× mono 24-bit: 23,040 × 3 × 4 = 276,480 B
  • 48 frames @ 50 fps, stereo 16-bit: 48/50 × 48000 = 46,080 × 4 = 184,320 B

Cameras that record native XAVC-I MXF (FX6, FR7) interrupt into a genuine MXF Generic Container: the file starts with an MXF Universal Label 06 0E 2B 34 … and the essence is KLV-wrapped (key (16) | BER length | value), one edit unit (video frame) at a time. There is no rtmd grid.

key bytes 4–8key[12]MeaningHandling
02 05 01 01Body Partition packskip
01 01 01 0201KLV fill (padding)skip
01 02 01 010x15GC Picture essence (one frame)extract video
01 02 01 010x16GC Sound essence, channel = key[15]extract PCM
01 02 01 010x17GC Data essence (ANC / metadata)skip

Because every packet’s length is explicit (BER), extraction is exact — the tool never scans for 06 0E 2B 34 inside compressed slices (which can match by chance). Packets are grouped into edit units: a picture starts a unit, the following sound packets attach to it, and a unit is committed when the next picture begins (or the stream ends). A truncated trailing unit (picture present but its PCM cut off mid-write) is dropped so the output decodes cleanly.

The picture value is a Sony XAVC-I access unit in Annex-B (start codes) — the opposite of the raw GOP variant’s AVCC framing — a ~0x2600-byte header then the slice data:

00 00 00 01 09 10 AUD
00 00 00 01 27 … SPS (H.264 High 4:2:2 Intra)
00 00 00 01 28 … PPS
00 … 00 zero padding to 0x2600
00 00 00 01 25 … IDR slice(s) — XAVC-I 4K is multi-slice
  • SPS/PPS repeat in every frame; the tool takes them once (first frame) for avcC and SPS-derived geometry.
  • For each frame it keeps only the VCL NALs (H.264 type 1–5), converts them to 4-byte length-prefixed, and concatenates them as the output sample (AUD/SPS/PPS/SEI are dropped — SPS/PPS live in avcC).
  • XAVC-I is all-intra → every sample is a keyframe (no ctts, no stss). Observed resolutions: 4096×2160 (FR7, DCI 4K) and 3840×2160 (FX6, UHD).
  • One mono PCM packet per channel per frame (channel in key[15]). Captures carry 8 channels (FX6/FR7 multi-XLR); unused inputs are digital silence.

  • The packet byte length gives the frame rate: samples/frame = len / bytes_per_sample, fps = 48000 / samples:

    Packet lensamples/framefps
    6006200223.976 (24000/1001)
    288096050
    5760192025
  • MXF PCM is little-endian. The tool byte-swaps LE→BE and emits in24 (24-bit) / twos (16-bit) at 48,000 Hz, one discrete mono track per channel.


Recovery also restores, where present:

  • Start timecode → written as a tmcd track so players (QuickTime, ffprobe, Resolve) show the original timecode, with the drop-frame flag preserved. In the raw GOP variant it comes from the private record header; in the MXF variant from the front-of-file SDTI-CP System Item (SMPTE-12M, BCD).
  • Camera model + serial → reported during recovery (e.g. ILME-FX6V 4016215). In the MXF variant this is read from the in-band ANC (GC Data) essence.
VariantOffset 0Also verify
Raw GOP stream00 1C 01 00F0 01 00 10 at offset 8 (bytes 4–7 vary — ignore them)
true-MXF / XAVC-I06 0E 2B 34 (MXF UL)a GC Picture key 06 0E 2B 34 01 02 01 01 0D 01 03 01 15 within the first 16 MiB

Neither variant has an ftyp or moov box — that’s exactly what’s missing and what recovery rebuilds.

Recovery streams in constant memory, so even 100 GB+ files never have to fit in RAM:

Terminal window
# Auto-detects the variant; everything is derived from the .rsv itself.
rsv-repair corrupt.RSV -o recovered.mp4

In the browser, the same engine runs as multithreaded WebAssembly — the input is read in slices and the recovered MP4 is streamed straight to the file you choose.

What it does:

  1. Detect the variant from the first bytes (rtmd vs MXF UL).
  2. Derive parameters from the essence — geometry from the SPS, frame rate from rtmd 0x8106 (rtmd) or the sound-packet length (MXF), audio layout from the essence, ctts from the bitstream POC (rtmd; MXF is all-intra).
  3. Walk the essence — for rtmd, count rtmd packets then find video AUDs and the audio chunk per GOP; for MXF, walk KLV edit units, keeping VCL NALs and byte-swapping PCM.
  4. Build the MP4 — synthesize moov (stbl: stsz/stco/stsc/stts/stss/ctts as applicable), plus a tmcd timecode track, and write ftyp + mdat + moov.

Raw GOP stream — Sony FX3 (ILME-FX3), FX30 (ILME-FX30), A7S III (ILCE-7SM3):

VideoAudioStatus
H.264 25pstereo 16-bit (twos)
H.264 25p4× mono 24-bit (in24)
H.264 50pstereo 24-bit
H.264 59.94pstereo 16-bit
HEVC 50pstereo 16-bit
HEVC 23.976pstereo 16-bit

true-MXF / XAVC-I — Sony FX6, FX9, FR7:

CameraVideoAudioStatus
FR7H.264 XAVC-I 4096×2160 23.976p8× mono 24-bit
FX6H.264 XAVC-I 3840×2160 50p8× mono 24-bit
FX9H.264 XAVC-I 4096×2160 25p8× mono 24-bit
  • ITU-T H.264 / ISO/IEC 14496-10 (AVC, incl. High 4:2:2 Intra); ITU-T H.265 (HEVC); ISO/IEC 14496-12 (MP4)
  • SMPTE ST 377-1 (MXF), ST 379-2 (Generic Container), ST 381 (MPEG/AVC in GC), ST 382 (PCM in GC — note MXF wave PCM is little-endian)
  • Sony XAVC format documentation