Open Petroleum Datasets · Parquet Format
High-performance columnar storage · Ready for analysis · Compressed & optimized
This repository provides the Equinor Volve field data in Parquet format, converted from the original dataset. The Volve field was Norway's first fully disclosed oil field dataset, offering real-world data for data engineering and analysis workflows.
The data has been structured into three normalized tables: daily metrics, monthly aggregations, and well metadata. Perfect for learning SQL, data engineering, or building analytical pipelines.
Query the Parquet files directly without loading into a database. Here's a simple example using DuckDB:
import duckdb # Connect to DuckDB (in-memory) con = duckdb.connect() # Query daily production data result = con.execute(""" SELECT w.wellbore_name, SUM(d.oil_volume) as total_oil, SUM(d.gas_volume) as total_gas, SUM(d.water_volume) as total_water FROM 'volve/daily_production.parquet' d JOIN 'volve/wells.parquet' w ON d.npd_wellbore_code = w.npd_wellbore_code WHERE d.date BETWEEN '2008-01-01' AND '2008-12-31' GROUP BY w.wellbore_name ORDER BY total_oil DESC LIMIT 10 """).fetchall() for row in result: print(row)
No database setup required. DuckDB reads Parquet files directly and efficiently, making it perfect for exploratory analysis and prototyping.
The dataset consists of three interconnected tables tracking well production metrics at different time granularities.
108 well log files from the FORCE 2020 Machine Learning Competition. Search for specific wells or browse by quadrant.
The FORCE 2020 Machine Learning Competition dataset contains well log data from 108 wells in the Norwegian Continental Shelf. Originally released for lithofacies prediction challenges, this dataset provides comprehensive petrophysical measurements ideal for machine learning and data science applications.
Each well file contains depth-indexed measurements including gamma ray, resistivity, density, neutron porosity, and sonic logs, along with lithofacies classifications.
Query the well log files directly. Here's an example analyzing a single well:
import duckdb # Connect to DuckDB (in-memory) con = duckdb.connect() # Analyze well 15-9-13 result = con.execute(""" SELECT WELL, MIN(DEPTH_MD) as min_depth, MAX(DEPTH_MD) as max_depth, AVG(GR) as avg_gamma_ray, AVG(RHOB) as avg_density, COUNT(*) as samples FROM 'force_2020/wells/15-9-13.parquet' GROUP BY WELL """).fetchall() for row in result: print(row) # Query multiple wells at once multi_well = con.execute(""" SELECT WELL, FORMATION, COUNT(*) as samples FROM 'force_2020/wells/*.parquet' WHERE FORMATION IS NOT NULL GROUP BY WELL, FORMATION ORDER BY WELL, samples DESC """).fetchall()
Each well file contains 29 columns with petrophysical measurements and metadata.