Decoding METAR Data: Insights from the Ocean Protocol Data Challenge

Published in

Ocean Protocol

6 min readMar 11, 2024

Introduction

The Meteorological Aerodrome Report (METAR) is a globally standardized format for communicating current weather conditions at airports or permanent weather observation stations. This format plays a vital role in aviation, aiding in flight planning and ensuring safety. Composed as a concise alphanumeric code, the METAR delivers comprehensive meteorological data specific to an airport at a particular moment. These reports, consisting of letters and numbers, are issued every 30 or 60 minutes by airports and airfields.

METAR, Miami International Airport (KMIA) on March 9, 2024, at 15:00 UTC

In the recently concluded data challenge hosted on Desights.ai, participants used exploratory data analysis (EDA) and advanced artificial intelligence (AI) techniques to enhance aviation weather forecasting accuracy. Focusing on Miami International Airport (KMIA), participants analyzed historical METAR reports spanning an entire decade.

Image showing the decoded METAR for KMIA on March 9, 2024, at 15:00 UTC, as provided by metar-taf.com

The challenge began with an in-depth exploration of EDA, where participants meticulously examined trends, patterns, and anomalies within the METAR data. Through this process, they gained valuable insights into Miami’s meteorological landscape, including seasonal variations and significant weather events impacting KMIA operations.

Transitioning to AI and machine learning (ML), participants developed models for precise weather prediction at KMIA. They addressed critical questions about prediction accuracy and model performance across weather phenomena. Additionally, they explored the influence of geographical and seasonal factors on prediction outcomes, aiming to enhance the applicability of their models in real-world aviation scenarios.

Winners Podium

The podium submissions exemplified ingenuity and precision in aviation weather forecasting. Participants showcased their expertise through innovative approaches, leveraging advanced machine-learning techniques to deliver accurate predictions. Let’s explore the top three submissions that stood out for their excellence and creativity in tackling this complex data challenge.

Andrey Bessalov — 1st Place

Andrey demonstrated a meticulous and methodologically sound approach to weather forecasting, using three critical stages:

1. Data Preprocessing and Feature Engineering: Andrey undertook an extensive data preprocessing exercise to clean and prepare the METAR data for analysis. It involved handling missing values, encoding categorical data, normalizing measurements, and converting various data types. He also applied feature engineering to extract additional information from the data, including time-based features and rolling averages.

2. Exploratory Data Analysis (EDA) and Model Development: He performed an exploratory analysis to understand the characteristics of the data, such as the distribution of various weather parameters and their correlations. Patterns and anomalies in the data, like seasonal variations, were identified. Andrey developed a machine-learning model and trained it to predict METAR data for the next hour, comparing different models ( linear regression, decision trees, and neural networks) and choosing the best based on performance. He analyzed which features were most influential in predicting weather conditions and evaluated the model based on its accuracy.

3. Individual Weather Component Analysis and Final Algorithm: The project involved detailed analysis and modeling of various weather components, such as air temperature, dew point, air pressure, wind speed and direction, visibility, rain, fog, thunderstorms, and clouds. Andrey built separate Xgboost models for each component to predict future values one hour ahead. He validated the models using data from 2023, with training data from 2014 to 2022. The final step was the creation of an algorithm to process the data, create features, make predictions for individual components, and then convert these predictions into METAR format for practical use.

Dominikus Brian — 2nd Place

Dominikus’ primary objectives were to explore the relationships between different meteorological data, uncover statistical patterns influencing weather, and demonstrate the use of historical METAR report data in building a predictive machine-learning model.

His submission outlined essential findings from the exploratory data analysis and machine learning model development. It employed a Long-Short Term Memory Neural Network to predict next-hour weather conditions around Miami International Airport, revealing that models with shorter memory spans are generally more effective. The study faced challenges in accurately predicting wind direction and cloud types while achieving high accuracy in predicting temperature, dew point, air pressure, and visibility. Additionally, it observed yearly weather trends, such as an increase in mean temperatures and seasonal variations like stronger winds in the latter half of the year. The study also included correlation analyses showing various relationships between different weather factors.

His report comprehensively explored employing machine learning in weather forecasting, focusing on Miami International Airport. It highlighted the effectiveness of models with shorter memory spans and the interconnectedness of various weather factors. The insights gained are valuable for understanding meteorological dynamics specific to the airport and developing a machine-learning pipeline for weather prediction. Plans include refining the model’s efficiency and exploring its applicability to other airports, broadening the scope of the study’s utility in aviation and related fields.

Anamaria Loznianu — 3rd Place

Anamaria presented a structured approach to predicting weather conditions using METAR data. She applied predictive models, including MultiOutputRegressor, RandomForestRegressor, and LSTM (Long Short-Term Memory), to forecast weather values like wind direction, speed, gusts, visibility, temperature, dew point, and pressure. The report mentioned significant discrepancies in some predictions, such as temperature, while acknowledging the better accuracy of models like RandomForestRegressor and LSTM.

In addition to historical data, Anamaria explored using real-time data from airports surrounding Miami to improve weather predictions. This included a novel approach of using real-time data from nearby airports to refine weather predictions for Miami. She analyzed weather patterns from eight nearby airports and examined how these patterns influenced Miami’s conditions, then aggregated the latest METAR data from nearby airports and analyzed whether weather patterns at these locations influenced Miami’s conditions.

Her report concluded that while historical data is vital for recognizing patterns, forecasting weather in the short term is most effective when combining current data with the overall direction of weather systems. While the MultiOutputRegressor showed significant discrepancies, especially in temperature predictions, the RandomForestRegressor and LSTM models delivered more accurate results with significantly lower error rates.

Interesting Facts

Data Preprocessing and Parsing

The challenge required processing approximately 90,000 METAR lines from the Miami International Airport (KMIA) spanning over ten years.

General Weather Trends

The study observed seasonal temperature variations, with Miami experiencing warm, humid summers and mild, dry winters due to its tropical monsoon climate. Rainfall is significant during the summer, while winters are the peak tourist season due to comfortable weather. The city is also susceptible to hurricanes and tropical storms.

Yearly Weather Trends

The report noted an increase in mean temperatures from an average of 25.01°C in 2014 to 26.24°C by 2023. It also highlighted seasonal variations, such as stronger winds in the second half of the year, poorer visibility in fall and winter, decreased air pressure, and more predominantly eastward wind directions towards the year-end.

Seasonal Analysis

The data showed clear seasonal patterns. The highest temperatures occurred in summer (July-August) and during late afternoons (16:00 to 18:00), while the lowest temperatures were in winter (December-January) and early mornings (6:00 to 9:00).

Challenges in Predicting Certain Weather Features

Among the primary features analyzed (wind direction, wind speed, visibility, cloud types, cloud heights, temperature, dew point, and air pressure), wind direction and cloud types were the most challenging to predict accurately. In contrast, temperature, dew point, air pressure, and visibility were predicted with high accuracy.

2024 Championship

Did you know that our reward structure has changed? The prize pool is now $10,000 for every challenge, and it pays the top 10 participants.

In addition to the monetary rewards, we’ve also revamped our points system to better recognize our top performers’ achievements. Points will now be awarded to the top 10 finishers in each event, with 100 championship points up for grabs.

Collect points in each challenge for the 2024 Championship. Last year’s Championships’ top 10 received an additional $10 for every point earned during the year.

The top 3 in each challenge may work with Ocean on a dApp that monetizes their algorithm. Unlike other platforms, data scientists keep the IP. Now, we help them bring it to market, as well.

About Ocean Protocol

Ocean was founded to level the playing field for AI and data. Ocean tools enable people to privately & securely publish, exchange, and consume data.

Follow Ocean on Twitter or Telegram to keep up to date. Chat directly with the Ocean community on Discord — or track Ocean’s progress on GitHub.