```
# π Data Format Specification - Aviation Safety AI
## Overview
This document specifies the data formats required for the Aviation Safety AI framework. All data must comply with aviation standards (ARINC, EUROCAE) and research reproducibility requirements.
## Core Data Requirements
### Minimum Required Parameters
```
Essential Parameters (8 Hz minimum sampling):
ββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ
βParameter β Symbol β Range β Units β
ββββββββββββββββββββββββΌβββββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
βPitch Angle β P β Β±20Β° β Degrees β
βBank Angle β B β Β±45Β° β Degrees β
βEngine Power β W β 0-100% β Percent N1 β
βAltitude β ALT β 0-50000 β Feet β
βAirspeed β IAS β 0-400 β Knots β
βVertical Speed β VS β Β±6000 β Feet/minute β
βHeading β HDG β 0-360 β Degrees β
ββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
```
### Extended Parameter Set (127 Parameters)
```
Flight Control Parameters (Group 1-20):
ββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ
βParameter β Frequency β Accuracy β Description β
ββββββββββββββββββββββββΌβββββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
βElevator Position β 16 Hz β Β±0.1Β° β Control surface β
βAileron Position β 16 Hz β Β±0.1Β° β Roll control β
βRudder Position β 16 Hz β Β±0.2Β° β Yaw control β
βFlap Position β 4 Hz β Β±1Β° β High-lift devices β
βSpoiler Position β 8 Hz β Β±5% β Speed brakes β
βTrim Position β 8 Hz β Β±0.5Β° β Control trim β
ββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
Engine Parameters (Group 21-40):
ββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ
βN1 (Fan Speed) β 8 Hz β Β±0.5% β Engine thrust indicator β
βEGT (Exhaust Gas) β 4 Hz β Β±5Β°C β Engine temperature β
βFuel Flow β 4 Hz β Β±1% β Consumption rate β
βOil Pressure β 2 Hz β Β±2 psi β Lubrication system β
βVibration β 32 Hz β Β±0.1g β Engine health β
ββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
System Parameters (Group 41-80):
ββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ
βHydraulic Pressure β 4 Hz β Β±50 psi β System pressure β
βElectrical Load β 8 Hz β Β±1% β Power consumption β
βCabin Pressure β 2 Hz β Β±0.1 psi β Passenger comfort β
βOxygen System β 1 Hz β Binary β Emergency system β
βFire Detection β 2 Hz β Binary β Safety system β
ββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
Environmental Parameters (Group 81-100):
ββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ
βTemperature β 2 Hz β Β±0.5Β°C β Outside air β
βWind Speed β 4 Hz β Β±2 kts β Air mass movement β
βTurbulence β 16 Hz β Β±0.1g β Vertical acceleration β
βIcing Detection β 2 Hz β Binary β Weather hazard β
βVisibility β 1 Hz β Β±100m β Meteorological β
ββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
Crew Interaction Parameters (Group 101-127):
ββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ
βControl Input β 16 Hz β Β±1% β Pilot commands β
βAutopilot Mode β 4 Hz β Enum β Automation state β
βAlert Count β 1 Hz β Integer β Warning/Caution/Advisoryβ
βChecklist Progress β 1 Hz β Percent β Procedure completion β
βRadio Communication β Event β Timestampβ ATC interactions β
ββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
```
## Data Formats
### Primary Format: CSV (Comma-Separated Values)
```
Required CSV Structure:
timestamp,pitch,bank,power,altitude,airspeed,...parameter_127
2023-10-15T14:30:00.000Z,-1.23,2.45,78.9,35000,280,...value
2023-10-15T14:30:00.125Z,-1.21,2.41,78.8,35001,279,...value
2023-10-15T14:30:00.250Z,-1.19,2.38,78.9,35002,280,...value
Metadata Header (optional):
Flight: QF32
Aircraft: A380-842 (VH-OQA)
Date: 2010-11-04
Sampling: 8 Hz
Parameters: 127
Units: SI (degrees, feet, knots, percent)
```
### Binary Format: HDF5 (Hierarchical Data Format)
```
Hierarchy Structure:
/flights/QF32/
βββ metadata (attributes)
β βββ aircraft: "A380-842"
β βββ date: "2010-11-04T10:01:00Z"
β βββ duration: 105.5 (minutes)
βββ parameters (dataset)
β βββ pitch: float32[50640] # 8Hz Γ 105.5min Γ 60
β βββ bank: float32[50640]
β βββ power: float32[50640]
β βββ ... (127 total)
βββ events (dataset)
βββ engine_explosion: 0.0 (seconds from start)
βββ return_decision: 600.0
βββ landing: 6300.0
```
### Aviation Standard Formats
```
ARINC 717 Format (Legacy FDR):
β’Frame length: 12-16 bits per word
β’Subframe structure: 4 Γ 64 words
β’Sampling: 64 Hz superframe, 8 Hz parameters
β’Encoding: Manchester biphase-L
ARINC 767 Format (Modern FDR):
β’Frame length: 1024 words
β’Word length: 12 bits
β’Sampling: Variable (1-256 Hz)
β’Parameters: 500+ possible
Eurocae ED-155 Format:
β’File extension: .aed
β’Compression: LZ77 optional
β’Encryption: AES-256 optional
β’Metadata: XML header
```
## Data Quality Requirements
### Accuracy Standards
```
Parameter Accuracy Requirements:
ββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ¬ββββββββββββββββββββββββββ
βCriticality Level β Accuracy β Latency β Example Parameters β
ββββββββββββββββββββββββΌβββββββββββββΌβββββββββββΌββββββββββββββββββββββββββ€
βSafety Critical β Β±0.1% β <10ms β Pitch, Bank, Airspeed β
βOperation Critical β Β±0.5% β <50ms β Altitude, Heading β
βPerformance β Β±1.0% β <100ms β Fuel, Temperature β
βInformational β Β±2.0% β <500ms β Cabin, Entertainment β
ββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ΄ββββββββββββββββββββββββββ
```
### Sampling Requirements
```
Minimum Sampling Rates:
ββββββββββββββββββββββββ¬βββββββββββββ¬ββββββββββββββββββββββββββ
βParameter Type β Minimum β Nyquist Consideration β
ββββββββββββββββββββββββΌβββββββββββββΌββββββββββββββββββββββββββ€
βControl Surfaces β 16 Hz β 32 Hz for 8 Hz dynamics β
βEngine Parameters β 8 Hz β 16 Hz for 4 Hz events β
βStructural Loads β 32 Hz β 64 Hz for vibration β
βCrew Actions β 4 Hz β 8 Hz for human response β
βEnvironmental β 2 Hz β 4 Hz for weather β
ββββββββββββββββββββββββ΄βββββββββββββ΄ββββββββββββββββββββββββββ
```
## Preprocessing Requirements
### Data Cleaning
```
Required Cleaning Steps:
1. Gap Detection & Imputation
β’ Gaps < 1 second: Linear interpolation
β’ Gaps 1-10 seconds: Pattern-based imputation
β’ Gaps > 10 seconds: Flag as missing
2. Outlier Detection
β’ Method: 5Ο threshold (99.9999% confidence)
β’ Handling: Winsorization (cap at 5Ο)
3. Time Alignment
β’ Resolution: 125ms (8 Hz grid)
β’ Method: Linear interpolation to grid
β’ Tolerance: Β±10ms maximum misalignment
4. Unit Conversion
β’ Angles: Degrees (consistent)
β’ Distance: Feet or meters (specified)
β’ Time: UTC timestamps
β’ Rates: Consistent time base
```
### Normalization
```
Standard Normalization Scheme:
For each parameter x:
x_normalized = (x - ΞΌ_training) / Ο_training
Training Statistics (Example):
ββββββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ
βParameter β ΞΌ (mean) β Ο (std) β
ββββββββββββββββββββββββΌβββββββββββββΌβββββββββββ€
βPitch (P) β -0.5Β° β 2.1Β° β
βBank (B) β 1.2Β° β 5.3Β° β
βPower (W) β 75.4% β 12.3% β
βAltitude β 35000 ft β 5000 ft β
βAirspeed β 280 kts β 40 kts β
ββββββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ
```
## Validation Rules
### Format Validation
```
CSV File Validation:
1. Header Check
β’ Required columns present
β’ Column names match specification
β’ No duplicate columns
2. Data Type Check
β’ Numeric fields contain numbers
β’ Timestamps in ISO 8601 format
β’ Enumerated fields within range
3. Completeness Check
β’ No empty rows (except trailing)
β’ Consistent column count
β’ Sequential timestamps
HDF5 File Validation:
1. Structure Validation
β’ Required groups present
β’ Datasets have correct dimensions
β’ Attributes contain metadata
2. Data Integrity
β’ Datasets not corrupted
β’ Data within valid ranges
β’ Consistent sampling rates
```
### Content Validation
```
Range Validation Rules:
β’Pitch: -20Β° β€ P β€ +20Β°
β’Bank: -45Β° β€ B β€ +45Β°
β’Power: 0% β€ W β€ 100%
β’Altitude: 0 β€ ALT β€ 50000 ft
β’Airspeed: 0 β€ IAS β€ 400 kts
Rate-of-Change Limits:
β’Pitch rate: |dP/dt| β€ 10Β°/s
β’Bank rate: |dB/dt| β€ 15Β°/s
β’Power rate: |dW/dt| β€ 20%/s
Physical Consistency:
β’Climb performance: VS β€ f(IAS, W)
β’Turn coordination: B β f(HDG_rate, IAS)
β’Energy management: ALT + IASΒ²/2g β constant
```
## Example Files
### Minimal Working Example
```csv
timestamp,pitch,bank,power
2023-10-15T14:30:00.000Z,-1.2,2.4,78.9
2023-10-15T14:30:00.125Z,-1.2,2.4,78.9
2023-10-15T14:30:00.250Z,-1.1,2.3,78.9
2023-10-15T14:30:00.375Z,-1.1,2.3,78.8
2023-10-15T14:30:00.500Z,-1.1,2.3,78.8
```
Complete Example Structure
```python
{
"metadata": {
"flight_number": "QF32",
"aircraft_type": "A380-842",
"registration": "VH-OQA",
"date": "2010-11-04",
"origin": "WSSS",
"destination": "YSSY",
"duration_minutes": 105.5,
"sampling_rate_hz": 8,
"parameters_count": 127,
"data_source": "FDR",
"data_quality": "A" # A=excellent, B=good, C=fair, D=poor
},
"data": {
"time": [0.0, 0.125, 0.25, ...], # seconds from start
"pitch": [-1.2, -1.2, -1.1, ...], # degrees
"bank": [2.4, 2.4, 2.3, ...], # degrees
"power": [78.9, 78.9, 78.9, ...], # percent
"altitude": [35000, 35001, 35002, ...], # feet
# ... 122 more parameters
},
"events": [
{
"name": "engine_explosion",
"time": 0.0,
"parameters": {
"vibration": 8.7, # g
"n1_left": 0.0, # percent
"n1_right": 78.9 # percent
}
},
{
"name": "return_decision",
"time": 600.0,
"parameters": {
"crew_discussion": "Return to Singapore",
"fuel_remaining": 45.2 # percent
}
}
]
}
```
Tools and Utilities
Provided Tools
```
Data Conversion Tools:
β’ csv_to_hdf5.py: Convert CSV to HDF5 format
β’ fdr_decoder.py: Decode ARINC 717/767 formats
β’ data_validator.py: Validate against specification
β’ data_visualizer.py: Visualize flight parameters
Quality Assessment:
β’ completeness_check.py: Check data gaps
β’ accuracy_assessment.py: Compare with truth data
β’ anomaly_detection.py: Find data quality issues
```
Integration Libraries
```python
# Python API for data handling
from aviation_safety.data import FlightData, DataValidator
# Load and validate data
data = FlightData.load('flight_data.csv')
validator = DataValidator(specification='full_127_params')
report = validator.validate(data)
if report.passed:
processed = data.preprocess()
analysis = analyze_flight(processed)
else:
print(f"Validation failed: {report.errors}")
```
Compliance and Certification
Regulatory Compliance
```
FAA Requirements (14 CFR Part 121):
β’ FDR parameters: Compliant with Appendix M
β’ Sampling rates: Meet or exceed requirements
β’ Accuracy: Within specified tolerances
EASA Requirements (CS-AWO):
β’ Flight recorder specifications: ED-112 compliant
β’ Data retention: 25 hours minimum
β’ Crash protection: 3400g, 1100Β°C, 30 days seawater
ICAO Requirements (Annex 6):
β’ Quick access recorder: Available data
β’ Encryption: Optional but recommended
β’ Data integrity: Checksums and validation
```
Research Ethics
```
Data Privacy:
β’ Anonymization: Remove crew/passenger identifiers
β’ Aggregation: Fleet-level analysis only
β’ Consent: Airline-level agreements
β’ Security: Encrypted storage and transmission
Reproducibility:
β’ Raw data: Archived with checksums
β’ Processing scripts: Version controlled
β’ Parameters: Documented transformations
β’ Random seeds: Fixed for reproducibility
```
Revision History
```
Version 2.1 (2025-12-28)
β’ Added HDF5 format specification
β’ Expanded parameter descriptions
β’ Added validation rules
Version 2.0 (2025-11-15)
β’ Standardized 127 parameter set
β’ Defined accuracy requirements
β’ Added preprocessing specifications
Version 1.0 (2025-09-01)
β’ Initial release
β’ Basic CSV format definition
β’ Minimum parameter requirements
```