Data Challenge Start: Aviation Weather Forecasting Using METAR Data

Published in

Ocean Protocol

4 min readFeb 1, 2024

METAR scores are made in every airport, every hour of the day. Can you predict the next few hours with the highest accuracy?

2024 marks the 3rd year of the Ocean Protocol Data Challenge Program initiative. ‘Aviation Weather Forecasting Using METAR Data’ is the second data challenge in 2024, and the second opportunity to score points in the Championship Leaderboard for this season. The challenge launches today Feb 1, 2024, with a deadline to participate ending Feb 20, 2024, 23:59:59 UTC. Access to the challenge description and submission guidelines can be found on the Desights platform.

The dataset used for the ‘Aviation Weather Forecasting Using METAR Data’ challenge holds METAR scores that update every hour of every day. at KPIM Miami International Airport. We will only use 1 airport for this data challenge, though METAR is a standard score updated at each airport. The data we use for this challenge is Miami's historical METAR logs from 2014–2023. Recognizing the symmetrical data structure of METAR as a metric, models developed in this challenge will be able to be run in real-time for 1–12hr ahead of present-time predictions for METAR score. This is a unique opportunity for data people to dive into real-world data and uncover insights that could shape the future of aviation safety, understanding, airline efficiency, and pilots driving planes.

Challenge Overview

Objective: Building upon the insights gained from Exploratory Data Analysis (EDA), participants in this data science competition will venture into hands-on, real-world artificial intelligence (AI) & machine learning (ML). Their primary objective is to develop advanced models that accurately predict future weather conditions at KMIA (Miami Airport). These AI/ML models become invaluable tools for aviation operations and safety by harnessing the extensive historical METAR data.

Data Set: Access to the dataset of historical METAR data points is available to download from the Ocean Market via the Mumbai Test Network (Polygon Testnet), and via Polygon Mainnet. You can download the dataset directly through Desights.

Machine Learning Model: Participants have free reign to attack their model selection based on their own preferences. Some ideas for Python native ML to include ARIMA/SARIMA using 'statsmodels', LSTM(Long Short-Term Memory) Networks using TensorFlow or Keras, Random Forest Regression using scikit-learn, or Gradient Boosting Machines using XGBoost. When implementing these models, you’ll typically start by preprocessing your time series data (e.g., normalization, choosing model weights, etc.), followed by feature engineering (like extracting time components and creating lag features). After that, you can train your model, tune its parameters, and validate its performance using metrics like RMSE, MAE, or MAPE and pursue the most accurate METAR prediction.

It’s also a good practice to perform cross-validation to assess the robustness of your model. Given the complexity and variability of METAR data, it might also be beneficial to ensemble different models to improve predictions in prior portions of the data challenge.

METAR

METAR is a format for reporting weather information. A METAR weather report is predominantly used by aircraft pilots, airlines, observation stations, and meteorologists who use aggregated METAR information to assist in current general weather forecasting.

Raw METAR is the most common format in the world for the transmission of observational weather data. It is highly standardized through the International Civil Aviation Organization (ICAO), which allows it to be understood throughout most of the world.

This is how you read a METAR

How is a METAR structured? A METAR (METeorological Aerodrome Report) is an observation and provides information about…

metar-taf.com

Additional Clarity: METAR/TAF LIST OF ABBREVIATIONS AND ACRONYMS

Prizes

In 2024, we’ve increased the prize pool to $10,000 USDC (or recievable in OCEAN) per challenge. This is distributed among the top 10 finishers, ensuring more participants can win. Additionally, each Data Challenge cycle offers opportunity to score points in the Data Challenge Championship Season Leaderboard. The prize breakdown for every cycle in monetary value + leaderboard points for End-Of-Season awards is as follows:

Points scored in each Data Challenge compound to leaderboard standings for the season championship. Leaderboard updates 1 week after each DC concludes: https://oceanprotocol.com/earn/data-challenges/

How to Participate

Sign Up to Desights: Create a profile in web3 style on Desights to join the data challenge and future data challenges.

Timeline: The challenge runs from February 1, 2024, - February 20, 2024.

Submission Guidelines: Please follow the ‘Evaluation Criteria’ and ‘Report Guidelines’ sections of the challenge overview in Desights for proper submission guidelines.

For questions, comments, and community data science dialogue, reach out in our discord #data-science-hub channel: https://discord.gg/yFRPH9PCN4 for updates and new challenges. Stay tuned for updates and discussions on our blog page blog.oceanprotocol.com for progress throughout the year!

To see past, current, and future data challenges sponsored by Ocean, please visit https://oceanprotocol.com/earn/data-challenges.

About Ocean Protocol

Ocean was founded to level the playing field for AI and data. Ocean tools enable people to privately & securely publish, exchange, and consume data.

Follow Ocean on Twitter or Telegram to keep up to date. Chat directly with the Ocean community on Discord. Or, track Ocean progress now on GitHub.