RSV file format (technical)
Sony .rsv layout: the raw GOP stream and the true-MXF/XAVC variant, and how recovery maps them into MP4.
A .rsv is an incomplete recording a Sony camera leaves behind when capture is interrupted (power loss, card removal, crash). It holds valid video + audio essence but lacks the MP4 container (ftyp/moov) a player needs. Recovery rebuilds that container around the essence: codec, resolution, frame rate, audio layout, and start timecode are all derived from the .rsv itself — no other input is needed.
There are two completely different .rsv layouts, and the recovery tool auto-detects which one it’s looking at by the first bytes of the file:
| Variant | Starts with | Cameras | Essence framing |
|---|---|---|---|
| Raw GOP stream | 00 1C 01 00 … F0 01 00 10 | FX3, FX30, A7S III | GOP blocks (rtmd + video + audio, contiguous) |
| true-MXF / XAVC-I | 06 0E 2B 34 (MXF Universal Label) | FX6, FX9, FR7 | KLV packets, one edit unit (frame) at a time |
The rest of this page describes each.
Variant A — raw GOP stream
Section titled “Variant A — raw GOP stream”Written by cameras that record to an MP4-like internal layout. This variant has no container — no MXF wrapper, no ftyp/moov — just a raw, contiguous run of GOP blocks, each led by an rtmd (real-time metadata) packet that the recovery keys on. Unlike MP4’s interleaved chunks, a .rsv of this kind uses a GOP-based block structure: all parts of a Group of Pictures are stored contiguously.
File structure
Section titled “File structure”[GOP 0][GOP 1][GOP 2] … [GOP N][incomplete tail]Each GOP is three contiguous sections:
┌───────────────────┬───────────────────────┬────────────────────┐│ rtmd block │ video essence │ audio chunk ││ N × packet_size │ H.264/AVC or HEVC │ PCM, computed │└───────────────────┴───────────────────────┴────────────────────┘- rtmd block — timed metadata, a run of fixed-size packets (one per video frame). It also carries the codec config (SPS/PPS) in GOP 0.
- Video essence — H.264/AVC or HEVC access units, AVCC framed (4-byte big-endian length prefix + NAL; not Annex-B start codes).
- Audio chunk — PCM, one chunk per audio track, concatenated.
Observed ranges (FX3 / FX30 / A7S III, 4K)
Section titled “Observed ranges (FX3 / FX30 / A7S III, 4K)”| Property | Typical | Observed range |
|---|---|---|
| rtmd packet size | 19,456 B | 11,264 – 29,696 B |
| rtmd packets / GOP (= frames/GOP) | 12 | 12 – 48 |
| Frame rate | 25 fps | 23.976, 24, 25, 50, 59.94 |
| GOP data size | ~6 MB | 6 – 27 MB |
| Bitrate | ~100 Mbps | 100 – 250 Mbps |
| Resolution | 3840×2160 | — |
GOP-constant assumption: frames_per_gop is auto-detected from the first GOP, and the audio chunk size is computed once and reused. Frames are still counted per-GOP, but audio sizing assumes a constant GOP length (verified on a 50 GB / 8,655-GOP clip — all 12 frames). A clip with varying GOP lengths would need per-GOP audio sizing.
rtmd block
Section titled “rtmd block”Each rtmd packet begins with a constant signature used as a delimiter:
Offset: 0 1 2 3 4 5 6 7 8 9 10 11Bytes: 00 1c 01 00 ?? ?? ?? ?? f0 01 00 10 └ prefix ─┘ └ variable┘ └ Sony tag ┘- Bytes 0–3
00 1c 01 00and bytes 8–11f0 01 00 10are constant; bytes 4–7 vary (timecode/frame counter). Verify both ranges — the 4-byte prefix alone produces false positives inside compressed essence.rtmd_packet_sizeis auto-detected as the distance between the first two valid headers. - Observed variants of bytes 4–7:
230328e1,22f728ed,230f28d5, …
This 12-byte signature is actually the start of a 28-byte (0x1C) Sony-private record header (the leading 00 1c is the header length) — not a standalone 12-byte header, and not an MXF System Item. The metadata block carries timecode, recording parameters, camera settings, and the codec config; it is Sony-proprietary, with no standard MXF descriptors.
Video essence (H.264/AVC or H.265/HEVC)
Section titled “Video essence (H.264/AVC or H.265/HEVC)”- Codecs: H.264 High / High 4:2:2 (
avc1) and HEVC (hvc1). There are no in-band SPS/PPS — the codec config lives in the rtmd metadata, which the tool parses to buildavcC/hvcC. - NAL format: AVCC (4-byte big-endian length + NAL).
- Frame boundaries are found by the Access Unit Delimiter (AUD) NAL:
- H.264:
00 00 00 02 09 XX(len=2, NAL type 9 = AUD) - HEVC:
00 00 00 03 46 01 XX(len=3, NAL type 35 = AUD)
- H.264:
| NAL type (H.264 / HEVC) | Name | Role |
|---|---|---|
| 9 / 35 | AUD | frame-start marker |
| 6 / 39 | SEI | supplemental info |
| 5 / 19–21 | IDR / IRAP | keyframe slice |
| 1 / 0–9 | non-IDR | P/B slice |
Typical composition — keyframe: [AUD][SEI…][IDR slice ×N]; inter-frame: [AUD][SEI][slice ×N].
Audio essence (PCM)
Section titled “Audio essence (PCM)”| Format | Codec | Bits | Channels | Bytes / sample-frame |
|---|---|---|---|---|
| Stereo 16-bit | twos / sowt | 16 | 2 | 4 |
| Stereo 24-bit | in24 | 24 | 2 | 6 |
| 4-track 24-bit | in24 (4× mono) | 24 | 4×1 | 12 total |
All at 48,000 Hz. Multi-track audio is stored sequentially by track within each GOP: [track1][track2][track3][track4]. The audio format (channels / bits / track count) is derived from the essence layout — no reference file needed.
Chunk-size calculation
Section titled “Chunk-size calculation”Use exact integer-rational rounding, not floating-point seconds:
samples_per_gop = round(frames_per_gop × 48000 × dur_per_sample / timescale)chunk_size_per_track = samples_per_gop × bytes_per_sampletotal_audio_size = chunk_size_per_track × num_audio_tracksThe float form (GOP_duration_sec × rate) is off by ~1 sample/GOP for NTSC-fractional rates (23.976 = 24000/1001, 29.97, 59.94), which corrupts one frame per GOP and drifts the audio. timescale / dur_per_sample come from rtmd tag 0x8106.
Worked examples (integer rates, where float and rational agree):
- 12 frames @ 25 fps, stereo 16-bit:
12/25 × 48000 = 23,040samples × 4 = 92,160 B - 12 frames @ 25 fps, 4× mono 24-bit:
23,040 × 3 × 4= 276,480 B - 48 frames @ 50 fps, stereo 16-bit:
48/50 × 48000 = 46,080× 4 = 184,320 B
Variant B — true-MXF / XAVC-I
Section titled “Variant B — true-MXF / XAVC-I”Cameras that record native XAVC-I MXF (FX6, FR7) interrupt into a genuine MXF Generic Container: the file starts with an MXF Universal Label 06 0E 2B 34 … and the essence is KLV-wrapped (key (16) | BER length | value), one edit unit (video frame) at a time. There is no rtmd grid.
KLV structure (one edit unit = one frame)
Section titled “KLV structure (one edit unit = one frame)”| key bytes 4–8 | key[12] | Meaning | Handling |
|---|---|---|---|
02 05 01 01 | — | Body Partition pack | skip |
01 01 01 02 | 01 | KLV fill (padding) | skip |
01 02 01 01 | 0x15 | GC Picture essence (one frame) | extract video |
01 02 01 01 | 0x16 | GC Sound essence, channel = key[15] | extract PCM |
01 02 01 01 | 0x17 | GC Data essence (ANC / metadata) | skip |
Because every packet’s length is explicit (BER), extraction is exact — the tool never scans for 06 0E 2B 34 inside compressed slices (which can match by chance). Packets are grouped into edit units: a picture starts a unit, the following sound packets attach to it, and a unit is committed when the next picture begins (or the stream ends). A truncated trailing unit (picture present but its PCM cut off mid-write) is dropped so the output decodes cleanly.
Video essence (GC Picture value)
Section titled “Video essence (GC Picture value)”The picture value is a Sony XAVC-I access unit in Annex-B (start codes) — the opposite of the raw GOP variant’s AVCC framing — a ~0x2600-byte header then the slice data:
00 00 00 01 09 10 AUD00 00 00 01 27 … SPS (H.264 High 4:2:2 Intra)00 00 00 01 28 … PPS00 … 00 zero padding to 0x260000 00 00 01 25 … IDR slice(s) — XAVC-I 4K is multi-slice- SPS/PPS repeat in every frame; the tool takes them once (first frame) for
avcCand SPS-derived geometry. - For each frame it keeps only the VCL NALs (H.264 type 1–5), converts them to 4-byte length-prefixed, and concatenates them as the output sample (AUD/SPS/PPS/SEI are dropped — SPS/PPS live in
avcC). - XAVC-I is all-intra → every sample is a keyframe (no
ctts, nostss). Observed resolutions: 4096×2160 (FR7, DCI 4K) and 3840×2160 (FX6, UHD).
Audio essence (GC Sound values)
Section titled “Audio essence (GC Sound values)”-
One mono PCM packet per channel per frame (channel in
key[15]). Captures carry 8 channels (FX6/FR7 multi-XLR); unused inputs are digital silence. -
The packet byte length gives the frame rate:
samples/frame = len / bytes_per_sample,fps = 48000 / samples:Packet len samples/frame fps 6006 2002 23.976 (24000/1001) 2880 960 50 5760 1920 25 -
MXF PCM is little-endian. The tool byte-swaps LE→BE and emits
in24(24-bit) /twos(16-bit) at 48,000 Hz, one discrete mono track per channel.
Rescued metadata (both variants)
Section titled “Rescued metadata (both variants)”Recovery also restores, where present:
- Start timecode → written as a
tmcdtrack so players (QuickTime, ffprobe, Resolve) show the original timecode, with the drop-frame flag preserved. In the raw GOP variant it comes from the private record header; in the MXF variant from the front-of-file SDTI-CP System Item (SMPTE-12M, BCD). - Camera model + serial → reported during recovery (e.g.
ILME-FX6V 4016215). In the MXF variant this is read from the in-band ANC (GC Data) essence.
File identification
Section titled “File identification”| Variant | Offset 0 | Also verify |
|---|---|---|
| Raw GOP stream | 00 1C 01 00 | F0 01 00 10 at offset 8 (bytes 4–7 vary — ignore them) |
| true-MXF / XAVC-I | 06 0E 2B 34 (MXF UL) | a GC Picture key 06 0E 2B 34 01 02 01 01 0D 01 03 01 15 within the first 16 MiB |
Neither variant has an ftyp or moov box — that’s exactly what’s missing and what recovery rebuilds.
Recovery
Section titled “Recovery”Recovery streams in constant memory, so even 100 GB+ files never have to fit in RAM:
# Auto-detects the variant; everything is derived from the .rsv itself.rsv-repair corrupt.RSV -o recovered.mp4In the browser, the same engine runs as multithreaded WebAssembly — the input is read in slices and the recovered MP4 is streamed straight to the file you choose.
What it does:
- Detect the variant from the first bytes (rtmd vs MXF UL).
- Derive parameters from the essence — geometry from the SPS, frame rate from rtmd
0x8106(rtmd) or the sound-packet length (MXF), audio layout from the essence,cttsfrom the bitstream POC (rtmd; MXF is all-intra). - Walk the essence — for rtmd, count rtmd packets then find video AUDs and the audio chunk per GOP; for MXF, walk KLV edit units, keeping VCL NALs and byte-swapping PCM.
- Build the MP4 — synthesize
moov(stbl:stsz/stco/stsc/stts/stss/cttsas applicable), plus atmcdtimecode track, and writeftyp + mdat + moov.
Known cameras / tested configurations
Section titled “Known cameras / tested configurations”Raw GOP stream — Sony FX3 (ILME-FX3), FX30 (ILME-FX30), A7S III (ILCE-7SM3):
| Video | Audio | Status |
|---|---|---|
| H.264 25p | stereo 16-bit (twos) | ✓ |
| H.264 25p | 4× mono 24-bit (in24) | ✓ |
| H.264 50p | stereo 24-bit | ✓ |
| H.264 59.94p | stereo 16-bit | ✓ |
| HEVC 50p | stereo 16-bit | ✓ |
| HEVC 23.976p | stereo 16-bit | ✓ |
true-MXF / XAVC-I — Sony FX6, FX9, FR7:
| Camera | Video | Audio | Status |
|---|---|---|---|
| FR7 | H.264 XAVC-I 4096×2160 23.976p | 8× mono 24-bit | ✓ |
| FX6 | H.264 XAVC-I 3840×2160 50p | 8× mono 24-bit | ✓ |
| FX9 | H.264 XAVC-I 4096×2160 25p | 8× mono 24-bit | ✓ |
References
Section titled “References”- ITU-T H.264 / ISO/IEC 14496-10 (AVC, incl. High 4:2:2 Intra); ITU-T H.265 (HEVC); ISO/IEC 14496-12 (MP4)
- SMPTE ST 377-1 (MXF), ST 379-2 (Generic Container), ST 381 (MPEG/AVC in GC), ST 382 (PCM in GC — note MXF wave PCM is little-endian)
- Sony XAVC format documentation