PetroData Repository Logo

PETRODATA REPOSITORY

Open Petroleum Datasets · Parquet Format

Download Parquet Files

High-performance columnar storage · Ready for analysis · Compressed & optimized

About This Dataset

This repository provides the Equinor Volve field data in Parquet format, converted from the original dataset. The Volve field was Norway's first fully disclosed oil field dataset, offering real-world data for data engineering and analysis workflows.

The data has been structured into three normalized tables: daily metrics, monthly aggregations, and well metadata. Perfect for learning SQL, data engineering, or building analytical pipelines.

Quick Start with DuckDB

Query the Parquet files directly without loading into a database. Here's a simple example using DuckDB:

import duckdb

# Connect to DuckDB (in-memory)
con = duckdb.connect()

# Query daily production data
result = con.execute("""
    SELECT
        w.wellbore_name,
        SUM(d.oil_volume) as total_oil,
        SUM(d.gas_volume) as total_gas,
        SUM(d.water_volume) as total_water
    FROM 'volve/daily_production.parquet' d
    JOIN 'volve/wells.parquet' w
        ON d.npd_wellbore_code = w.npd_wellbore_code
    WHERE d.date BETWEEN '2008-01-01' AND '2008-12-31'
    GROUP BY w.wellbore_name
    ORDER BY total_oil DESC
    LIMIT 10
""").fetchall()

for row in result:
    print(row)

No database setup required. DuckDB reads Parquet files directly and efficiently, making it perfect for exploratory analysis and prototyping.

Database Schema

The dataset consists of three interconnected tables tracking well production metrics at different time granularities.

wells

Well metadata and facility information

npd_wellbore_code
wellbore_code
wellbore_name
npd_field_code
npd_field_name
npd_facility_code
npd_facility_name

daily_production

Daily well production metrics and parameters

date
npd_wellbore_code
on_stream_hours
avg_downhole_pressure
avg_dp_tubing
avg_annulus_pressure
avg_wellhead_pressure
avg_downhole_temperature
avg_wellhead_temperature
avg_choke_size_percent
avg_choke_unit
dp_choke_size
oil_volume
gas_volume
water_volume
water_injection_volume
flow_kind
well_type
→ npd_wellbore_code references wells.npd_wellbore_code

monthly_production

Aggregated monthly production volumes

date
npd_wellbore_code
on_stream_hours
oil_volume_sm3
gas_volume_sm3
water_volume_sm3
gas_injection_sm3
water_injection_sm3
→ npd_wellbore_code references wells.npd_wellbore_code

Download Well Log Files

108 well log files from the FORCE 2020 Machine Learning Competition. Search for specific wells or browse by quadrant.

Showing all 108 wells
Quadrant 17
1 well
Quadrant 26
1 well
Quadrant 32
1 well
Quadrant 36
1 well

About FORCE 2020 Dataset

The FORCE 2020 Machine Learning Competition dataset contains well log data from 108 wells in the Norwegian Continental Shelf. Originally released for lithofacies prediction challenges, this dataset provides comprehensive petrophysical measurements ideal for machine learning and data science applications.

Each well file contains depth-indexed measurements including gamma ray, resistivity, density, neutron porosity, and sonic logs, along with lithofacies classifications.

Quick Start with DuckDB

Query the well log files directly. Here's an example analyzing a single well:

import duckdb

# Connect to DuckDB (in-memory)
con = duckdb.connect()

# Analyze well 15-9-13
result = con.execute("""
    SELECT
        WELL,
        MIN(DEPTH_MD) as min_depth,
        MAX(DEPTH_MD) as max_depth,
        AVG(GR) as avg_gamma_ray,
        AVG(RHOB) as avg_density,
        COUNT(*) as samples
    FROM 'force_2020/wells/15-9-13.parquet'
    GROUP BY WELL
""").fetchall()

for row in result:
    print(row)

# Query multiple wells at once
multi_well = con.execute("""
    SELECT WELL, FORMATION, COUNT(*) as samples
    FROM 'force_2020/wells/*.parquet'
    WHERE FORMATION IS NOT NULL
    GROUP BY WELL, FORMATION
    ORDER BY WELL, samples DESC
""").fetchall()

Well Log Schema

Each well file contains 29 columns with petrophysical measurements and metadata.

Well Log Columns

29 columns per well file

WELL
DEPTH_MD
X_LOC
Y_LOC
Z_LOC
GROUP
FORMATION
dataset
CALI
RSHA
RMED
RDEP
RHOB
GR
SGR
NPHI
PEF
DTC
SP
BS
ROP
DTS
DCAL
DRHO
MUDWEIGHT
RMIC
ROPA
RXO
FORCE_2020_LITHOFACIES_LITHOLOGY
Log curves: GR (Gamma Ray), RHOB (Bulk Density), NPHI (Neutron Porosity), DTC/DTS (Sonic), RDEP/RMED/RSHA (Resistivity), CALI (Caliper), PEF (Photoelectric Factor)