RSV file format (technical)
Sony .rsv layout: GOP blocks, rtmd, codecs, and how recovery maps data into MP4.
Overview
Section titled “Overview”RSV files are incomplete or corrupted video files created by Sony cameras (e.g., Sony FX3) when recording is interrupted unexpectedly (power loss, card removal, etc.). The .rsv file extension denotes a reserved, backup, or temporary file that contains raw video data but lacks the proper container structure (like MP4 or MXF) needed for playback. Despite being incomplete, RSV files contain valid video and audio data; they simply lack the proper MP4 container structure (moov atom) needed for playback.
Unlike standard MP4 files which use interleaved chunks, RSV files use a GOP-based block structure where all components of a GOP (Group of Pictures) are stored contiguously.
File Structure
Section titled “File Structure”High-Level Layout
Section titled “High-Level Layout”[GOP 0][GOP 1][GOP 2]...[GOP N][incomplete data]Each GOP contains three sections in order:
- rtmd block - Timed metadata (variable packet count, typically 12 packets)
- Video essence - H.264/AVC or H.265/HEVC video frames (frame count detected from first GOP, assumed constant)
- Audio essence - PCM audio samples (size calculated once from first GOP’s frame count, reused for all GOPs)
GOP Structure Detail
Section titled “GOP Structure Detail”┌─────────────────────────────────────────────────────────────┐│ GOP N │├─────────────────┬──────────────────────┬───────────────────-┤│ rtmd Block │ Video Frames │ Audio Chunk ││ Variable size │ Variable size │ Variable size ││ (N × packet) │ (variable count) │ (calculated) │└─────────────────┴──────────────────────┴───────────────────-┘Observed variations:
| Setting | Typical | Observed Range |
|---|---|---|
| rtmd packet size | 19,456 bytes | 11,264 - 29,696 bytes |
| rtmd packets per GOP | 12 | 12 - 48 |
| Frames per GOP | 12 | 12 - 48 |
| Frame rate | 25 fps | 23.98, 25, 50 fps |
| GOP data size | ~6 MB | 6 - 27 MB |
Example (12-frame GOP at 25fps):
- rtmd block: ~233,472 bytes (12 packets × 19,456 bytes)
- Video frames: ~5-6 MB (12 frames, variable size)
- Audio chunk: ~92,160 bytes (stereo 16-bit) or ~276,480 bytes (4-channel 24-bit)
rtmd Block (Timed Metadata)
Section titled “rtmd Block (Timed Metadata)”Structure
Section titled “Structure”- Total size: Variable (depends on packet count and packet size)
- Packet count: Typically matches frames per GOP (observed: 12, 24, 48)
- Packet size: Auto-detected from file (observed: 11,264, 19,456, 29,696 bytes)
Packet Header Format
Section titled “Packet Header Format”Each rtmd packet has a 12-byte header structure:
Offset: 0 1 2 3 4 5 6 7 8 9 10 11Bytes: 00 1c 01 00 ?? ?? ?? ?? f0 01 00 10 ├──────────┤ ├────────┤ ├─────────┤ Prefix Variable Sony tag (constant) metadata (constant)| Offset | Bytes | Description |
|---|---|---|
| 0-3 | 00 1c 01 00 | Constant prefix (identifies rtmd packet type) |
| 4-7 | variable | Camera metadata (timecode, frame counter, etc.) |
| 8-11 | f0 01 00 10 | Sony-specific tag (constant across all packets) |
Identifying Real rtmd Packets
Section titled “Identifying Real rtmd Packets”Important: The 4-byte prefix 00 1c 01 00 can appear randomly in video/audio data as false positives. To reliably identify real rtmd packets, verify both:
- Bytes 0-3 match
00 1c 01 00(prefix) - Bytes 8-11 match
f0 01 00 10(Sony tag)
Observed header variants (bytes 4-7 vary):
001c0100 230328e1 f0010010... (most common)001c0100 22f728ed f0010010... (variant)001c0100 230f28d5 f0010010... (variant)Content
Section titled “Content”The rtmd data contains camera metadata such as:
- Timecode information
- Recording parameters
- Camera settings
- GPS data (if enabled)
Video Essence (H.264/AVC or H.265/HEVC)
Section titled “Video Essence (H.264/AVC or H.265/HEVC)”Supported Codecs
Section titled “Supported Codecs”- H.264/AVC (codec name:
avc1) - Most common - H.265/HEVC (codec name:
hvc1) - Also supported
Frame Structure
Section titled “Frame Structure”- Codec: H.264/AVC High Profile or H.265/HEVC
- NAL unit format: Length-prefixed (4-byte big-endian length + NAL data)
- Frames per GOP: Auto-detected from the first GOP (typically 12), assumed constant throughout file
Frame Identification
Section titled “Frame Identification”H.264/AVC frames begin with an Access Unit Delimiter (AUD) NAL:
00 00 00 02 09 XXWhere:
00 00 00 02= NAL length (2 bytes)09= NAL type 9 (AUD)XX= AUD payload (typically10or30)
H.265/HEVC frames begin with an Access Unit Delimiter (AUD) NAL:
00 00 00 03 46 01 XXWhere:
00 00 00 03= NAL length (3 bytes)46= First byte of HEVC NAL unit (NAL type 35 = AUD)01 XX= AUD payload
NAL Unit Types Present
Section titled “NAL Unit Types Present”| Type | Name | Description |
|---|---|---|
| 9 | AUD | Access Unit Delimiter (frame start marker) |
| 6 | SEI | Supplemental Enhancement Information |
| 5 | IDR | Instantaneous Decoder Refresh (keyframe slice) |
| 1 | Non-IDR | Regular slice (P/B frame) |
Frame Composition
Section titled “Frame Composition”A typical frame contains:
- AUD NAL (2 bytes payload)
- SEI NAL(s) (metadata)
- Slice NAL(s) (actual video data)
Keyframe (IDR) example:
[AUD len=2][SEI len=19][SEI len=26][SEI len=29][SEI len=14][SEI len=5][IDR slice ×8]Inter-frame (P/B) example:
[AUD len=2][SEI len=14][Non-IDR slice ×9]Important Note on NAL Parsing
Section titled “Important Note on NAL Parsing”The compressed slice data within NAL units can contain any byte pattern, including patterns that look like NAL length prefixes. Do NOT attempt to parse NAL units by reading length fields through slice data. Instead, search for AUD patterns to find frame boundaries.
GOP Sizes
Section titled “GOP Sizes”The implementation auto-detects frames_per_gop by counting AUD patterns in the first GOP (typically 12 frames). It then assumes this frame count is consistent throughout the entire file.
Implementation behavior:
frames_per_gopis detected from the first GOP onlyaudio_chunk_sizeis calculated once from this detected value and reused for all GOPs- Frames are counted individually per GOP (would detect if sizes varied, but doesn’t affect audio chunk calculation)
- The implementation assumes GOP sizes are constant - if they varied, audio chunk size calculation would be incorrect
Note: Analysis of one 50GB RSV file (8,655 GOPs) showed all GOPs had exactly 12 frames, but the implementation makes no guarantee about consistency. If GOP sizes varied, the code would need to calculate audio_chunk_size per-GOP based on the actual frame count.
Audio Essence (PCM)
Section titled “Audio Essence (PCM)”Supported Formats
Section titled “Supported Formats”| Format | Codec | Bits | Channels | Bytes/Sample |
|---|---|---|---|---|
| Stereo 16-bit | twos/sowt | 16 | 2 | 4 |
| Multi-track 24-bit | ipcm | 24 | 4×1 (mono) | 3 per track |
Common Configurations
Section titled “Common Configurations”Stereo 16-bit (twos/sowt) - Most common:
- Sample rate: 48,000 Hz
- Channels: 2 (stereo)
- Bits per sample: 16
- Bytes per sample: 4
Multi-track 24-bit (ipcm) - Professional cameras:
- Sample rate: 48,000 Hz
- Tracks: 4 separate mono tracks
- Bits per sample: 24
- Bytes per sample per track: 3
- Total audio per GOP: 4 × per-track size
Chunk Structure
Section titled “Chunk Structure”- Chunk size: Calculated from GOP duration and audio format
- Samples per chunk:
GOP_duration × sample_rate - Total audio size:
samples × bytes_per_sample × num_tracks
Calculation
Section titled “Calculation”Audio chunk size is calculated based on GOP duration:
GOP_duration_sec = frames_per_gop / fpssamples_per_chunk = GOP_duration_sec × audio_sample_ratechunk_size_per_track = samples_per_chunk × bytes_per_sampletotal_audio_size = chunk_size_per_track × num_audio_tracksExample 1 (12 frames at 25fps, 48kHz stereo 16-bit):
GOP_duration = 12 frames ÷ 25 fps = 0.480 secondssamples_per_chunk = 0.480s × 48000 Hz = 23,040 sampleschunk_size = 23,040 × 4 bytes = 92,160 bytesExample 2 (12 frames at 25fps, 48kHz 4-track 24-bit):
GOP_duration = 12 frames ÷ 25 fps = 0.480 secondssamples_per_chunk = 0.480s × 48000 Hz = 23,040 sampleschunk_size_per_track = 23,040 × 3 bytes = 69,120 bytestotal_audio_size = 69,120 × 4 tracks = 276,480 bytesExample 3 (48 frames at 50fps, 48kHz stereo 16-bit):
GOP_duration = 48 frames ÷ 50 fps = 0.960 secondssamples_per_chunk = 0.960s × 48000 Hz = 46,080 sampleschunk_size = 46,080 × 4 bytes = 184,320 bytesMulti-Track Audio Layout
Section titled “Multi-Track Audio Layout”For cameras with multiple mono audio tracks (e.g., 4-channel ipcm), the audio data in each GOP is stored sequentially by track:
[Track 1 audio][Track 2 audio][Track 3 audio][Track 4 audio]Each track contains the same number of samples for the GOP duration.
Comparison with MP4
Section titled “Comparison with MP4”MP4 (Interleaved)
Section titled “MP4 (Interleaved)”[ftyp][moov][mdat: V₀ A₀ V₁ A₁ V₂ A₂ ...]Video and audio chunks are interleaved within mdat.
RSV (Block-based)
Section titled “RSV (Block-based)”[rtmd₀ V₀₋₁₁ A₀][rtmd₁ V₁₂₋₂₃ A₁][rtmd₂ V₂₄₋₃₅ A₂]...Each GOP’s data is stored as a contiguous block.
File Identification
Section titled “File Identification”Magic Bytes
Section titled “Magic Bytes”RSV files can be identified by checking offset 0 for the rtmd pattern. The implementation uses a specific header pattern for detection:
Offset 0-7: 00 1c 01 00 23 03 28 e1This matches the most common rtmd header variant. For more robust detection, verify:
Offset 0-3: 00 1c 01 00 (prefix)Offset 8-11: f0 01 00 10 (Sony tag)Note: Bytes 4-7 are variable and should not be used for identification. The implementation auto-detects RSV files when this pattern is found at offset 0, or can be forced with the -rsv command-line flag.
No Standard Container
Section titled “No Standard Container”RSV files lack:
ftypatom (file type)moovatom (movie metadata)- Standard MP4 box structure
The entire file consists of raw GOP blocks.
Recovery Process
Section titled “Recovery Process”To recover an RSV file using untrunc:
-
Obtain reference file: Need a properly-recorded MP4 from the same camera with matching codec parameters (H.264/AVC or H.265/HEVC).
-
Run untrunc:
Terminal window untrunc reference.mp4 corrupt.RSVOr explicitly enable RSV mode:
Terminal window untrunc -rsv reference.mp4 corrupt.RSV -
Auto-detection:
- RSV files are automatically detected if they start with the rtmd pattern at offset 0
- The implementation auto-detects
rtmd_packet_sizeby finding the distance between the first two rtmd patterns - The implementation auto-detects
frames_per_gopby counting AUD patterns in the first GOP - Audio chunk size is calculated once from the first GOP’s frame count and reused for all GOPs
-
Parse GOP structure:
- Find rtmd packets by checking both prefix (
001c0100) AND Sony tag (f0010010at offset 8) - Count consecutive rtmd packets (using auto-detected packet size) to find video start
- Extract video frames by finding AUD patterns:
- H.264:
00 00 00 02 09(NAL type 9) - HEVC:
00 00 00 03 46(NAL type 35)
- H.264:
- Audio chunk follows immediately after video frames (size calculated from GOP duration)
- Find rtmd packets by checking both prefix (
-
Build MP4 container:
- Use codec parameters from reference file
- Create proper stbl atoms (stco, stsz, stsc, stts, stss, ctts)
- Map frame offsets to the RSV data
-
Handle GOP structure:
- Count actual AUD patterns per GOP (frames are counted individually)
- Count consecutive valid rtmd packets per GOP (rtmd count can vary)
- Avoid false positive rtmd matches by verifying the Sony tag at offset 8
- Note: Audio chunk size is calculated from the first GOP and reused for all GOPs. The implementation assumes GOP sizes are constant throughout the file - if they varied, this would be incorrect.
Known Cameras
Section titled “Known Cameras”This format has been tested with:
- Sony FX3 (ILME-FX3)
- Sony FX30 (ILME-FX30)
- Sony A7S III (ILCE-7SM3)
Typical File Characteristics
Section titled “Typical File Characteristics”| Property | Typical Value | Observed Range |
|---|---|---|
| Video codec | H.264/AVC or H.265/HEVC | avc1, hvc1 |
| Resolution | 3840×2160 (4K) | - |
| Frame rate | 25 fps | 23.98, 25, 50 fps |
| Frames per GOP | 12 | 12, 24, 48 |
| Audio codec | PCM 16-bit (twos/sowt) | twos, sowt, ipcm (24-bit) |
| Audio channels | 2 (stereo) | 2 stereo or 4×1 mono |
| Audio rate | 48000 Hz | - |
| Bitrate | ~100 Mbps | 100-250 Mbps |
| rtmd packet size | 19,456 bytes | 11,264 - 29,696 bytes |
| GOP data size | ~6 MB | 6 - 27 MB |
References
Section titled “References”- ITU-T H.264 / ISO/IEC 14496-10 (AVC)
- ISO/IEC 14496-12 (MP4 container)
- Sony XAVC format documentation
Implementation Notes
Section titled “Implementation Notes”Auto-Detection Features
Section titled “Auto-Detection Features”The untrunc implementation includes several auto-detection features:
- RSV File Detection: Automatically detects RSV files by checking for rtmd pattern at offset 0
- rtmd Packet Size: Auto-detects packet size by measuring distance between first two rtmd patterns (handles 11K-30K byte packets)
- Frames per GOP: Auto-detects by counting AUD patterns in the first GOP (handles 12-48 frames)
- Audio Track Discovery: Finds all audio tracks (twos, sowt, ipcm) and calculates total audio size
- Multi-Track Audio: Supports cameras with multiple mono tracks (e.g., 4×24-bit ipcm)
- Codec Support: Supports both H.264/AVC (
avc1) and H.265/HEVC (hvc1) codecs
Buffer Requirements
Section titled “Buffer Requirements”The implementation uses a 128MB buffer for GOP parsing to handle high-bitrate recordings:
- 25fps H.264 4K: ~6 MB per GOP
- 50fps HEVC 4K: ~25 MB per GOP
Command-Line Usage
Section titled “Command-Line Usage”# Auto-detect RSV fileuntrunc reference.mp4 corrupt.RSV
# Force RSV modeuntrunc -rsv reference.mp4 corrupt.RSV
# For truncated reference files, use -dcc to skip chunk validationuntrunc -dcc -rsv reference.mp4 corrupt.RSVThe -rsv flag forces RSV recovery mode even if auto-detection fails.
Tested Configurations
Section titled “Tested Configurations”| Configuration | Video | Audio | Status |
|---|---|---|---|
| H.264 25fps + stereo 16-bit | avc1 | twos | ✓ |
| H.264 25fps + 4×mono 24-bit | avc1 | ipcm | ✓ |
| HEVC 50fps + stereo 16-bit | hvc1 | twos | ✓ |
| HEVC 23.98fps + stereo 16-bit | hvc1 | twos | ✓ |