PETRODATA REPOSITORY

Open Petroleum Datasets · Parquet Format

Download Parquet Files

daily_production.parquet ⬇ monthly_production.parquet ⬇ wells.parquet ⬇

High-performance columnar storage · Ready for analysis · Compressed & optimized

About This Dataset

This repository provides the Equinor Volve field data in Parquet format, converted from the original dataset. The Volve field was Norway's first fully disclosed oil field dataset, offering real-world data for data engineering and analysis workflows.

The data has been structured into three normalized tables: daily metrics, monthly aggregations, and well metadata. Perfect for learning SQL, data engineering, or building analytical pipelines.

Quick Start with DuckDB

Query the Parquet files directly without loading into a database. Here's a simple example using DuckDB:

import duckdb

# Connect to DuckDB (in-memory)
con = duckdb.connect()

# Query daily production data
result = con.execute("""
    SELECT
        w.wellbore_name,
        SUM(d.oil_volume) as total_oil,
        SUM(d.gas_volume) as total_gas,
        SUM(d.water_volume) as total_water
    FROM 'volve/daily_production.parquet' d
    JOIN 'volve/wells.parquet' w
        ON d.npd_wellbore_code = w.npd_wellbore_code
    WHERE d.date BETWEEN '2008-01-01' AND '2008-12-31'
    GROUP BY w.wellbore_name
    ORDER BY total_oil DESC
    LIMIT 10
""").fetchall()

for row in result:
    print(row)

No database setup required. DuckDB reads Parquet files directly and efficiently, making it perfect for exploratory analysis and prototyping.

Database Schema

The dataset consists of three interconnected tables tracking well production metrics at different time granularities.

wells

Well metadata and facility information

npd_wellbore_code

wellbore_code

wellbore_name

npd_field_code

npd_field_name

npd_facility_code

npd_facility_name

daily_production

Daily well production metrics and parameters

date

npd_wellbore_code

on_stream_hours

avg_downhole_pressure

avg_dp_tubing

avg_annulus_pressure

avg_wellhead_pressure

avg_downhole_temperature

avg_wellhead_temperature

avg_choke_size_percent

avg_choke_unit

dp_choke_size

oil_volume

gas_volume

water_volume

water_injection_volume

flow_kind

well_type

→ npd_wellbore_code references wells.npd_wellbore_code

monthly_production

Aggregated monthly production volumes

date

npd_wellbore_code

on_stream_hours

oil_volume_sm3

gas_volume_sm3

water_volume_sm3

gas_injection_sm3

water_injection_sm3

→ npd_wellbore_code references wells.npd_wellbore_code

Download Well Log Files

108 well log files from the FORCE 2020 Machine Learning Competition. Search for specific wells or browse by quadrant.

Showing all 108 wells

Quadrant 7

2 wells ▼

7-1-1.parquet ⬇ 7-1-2_S.parquet ⬇

Quadrant 15

4 wells ▼

15-9-13.parquet ⬇ 15-9-14.parquet ⬇ 15-9-15.parquet ⬇ 15-9-17.parquet ⬇

Quadrant 16

15 wells ▼

16-1-2.parquet ⬇ 16-1-6_A.parquet ⬇ 16-2-6.parquet ⬇ 16-2-11_A.parquet ⬇ 16-2-16.parquet ⬇ 16-4-1.parquet ⬇ 16-5-3.parquet ⬇ 16-7-4.parquet ⬇ 16-7-5.parquet ⬇ 16-8-1.parquet ⬇ 16-10-1.parquet ⬇ 16-10-2.parquet ⬇ 16-10-3.parquet ⬇ 16-10-5.parquet ⬇ 16-11-1_ST3.parquet ⬇

Quadrant 17

1 well ▼

17-11-1.parquet ⬇

Quadrant 25

20 wells ▼

25-2-7.parquet ⬇ 25-2-13_T4.parquet ⬇ 25-2-14.parquet ⬇ 25-3-1.parquet ⬇ 25-4-5.parquet ⬇ 25-5-1.parquet ⬇ 25-5-3.parquet ⬇ 25-5-4.parquet ⬇ 25-6-1.parquet ⬇ 25-6-2.parquet ⬇ 25-6-3.parquet ⬇ 25-7-2.parquet ⬇ 25-8-5_S.parquet ⬇ 25-8-7.parquet ⬇ 25-9-1.parquet ⬇ 25-10-10.parquet ⬇ 25-11-5.parquet ⬇ 25-11-15.parquet ⬇ 25-11-19_S.parquet ⬇ 25-11-24.parquet ⬇

Quadrant 26

1 well ▼

26-4-1.parquet ⬇

Quadrant 29

2 wells ▼

29-3-1.parquet ⬇ 29-6-1.parquet ⬇

Quadrant 30

3 wells ▼

30-3-3.parquet ⬇ 30-3-5_S.parquet ⬇ 30-6-5.parquet ⬇

Quadrant 31

14 wells ▼

31-2-1.parquet ⬇ 31-2-7.parquet ⬇ 31-2-8.parquet ⬇ 31-2-9.parquet ⬇ 31-2-19_S.parquet ⬇ 31-3-1.parquet ⬇ 31-3-2.parquet ⬇ 31-3-3.parquet ⬇ 31-3-4.parquet ⬇ 31-4-5.parquet ⬇ 31-4-10.parquet ⬇ 31-5-4_S.parquet ⬇ 31-6-5.parquet ⬇ 31-6-8.parquet ⬇

Quadrant 32

1 well ▼

32-2-1.parquet ⬇

Quadrant 33

4 wells ▼

33-5-2.parquet ⬇ 33-6-3_S.parquet ⬇ 33-9-1.parquet ⬇ 33-9-17.parquet ⬇

Quadrant 34

21 wells ▼

34-2-4.parquet ⬇ 34-3-1_A.parquet ⬇ 34-3-3_A.parquet ⬇ 34-4-10_R.parquet ⬇ 34-5-1_A.parquet ⬇ 34-5-1_S.parquet ⬇ 34-6-1_S.parquet ⬇ 34-7-13.parquet ⬇ 34-7-20.parquet ⬇ 34-7-21.parquet ⬇ 34-8-1.parquet ⬇ 34-8-3.parquet ⬇ 34-8-7_R.parquet ⬇ 34-10-16_R.parquet ⬇ 34-10-19.parquet ⬇ 34-10-21.parquet ⬇ 34-10-33.parquet ⬇ 34-10-35.parquet ⬇ 34-11-1.parquet ⬇ 34-11-2_S.parquet ⬇ 34-12-1.parquet ⬇

Quadrant 35

19 wells ▼

35-3-7_S.parquet ⬇ 35-4-1.parquet ⬇ 35-6-2_S.parquet ⬇ 35-8-4.parquet ⬇ 35-8-6_S.parquet ⬇ 35-9-2.parquet ⬇ 35-9-5.parquet ⬇ 35-9-6_S.parquet ⬇ 35-9-8.parquet ⬇ 35-9-10_S.parquet ⬇ 35-11-1.parquet ⬇ 35-11-6.parquet ⬇ 35-11-7.parquet ⬇ 35-11-10.parquet ⬇ 35-11-11.parquet ⬇ 35-11-12.parquet ⬇ 35-11-13.parquet ⬇ 35-11-15_S.parquet ⬇ 35-12-1.parquet ⬇

Quadrant 36

1 well ▼

36-7-3.parquet ⬇

About FORCE 2020 Dataset

The FORCE 2020 Machine Learning Competition dataset contains well log data from 108 wells in the Norwegian Continental Shelf. Originally released for lithofacies prediction challenges, this dataset provides comprehensive petrophysical measurements ideal for machine learning and data science applications.

Each well file contains depth-indexed measurements including gamma ray, resistivity, density, neutron porosity, and sonic logs, along with lithofacies classifications.

Quick Start with DuckDB

Query the well log files directly. Here's an example analyzing a single well:

import duckdb

# Connect to DuckDB (in-memory)
con = duckdb.connect()

# Analyze well 15-9-13
result = con.execute("""
    SELECT
        WELL,
        MIN(DEPTH_MD) as min_depth,
        MAX(DEPTH_MD) as max_depth,
        AVG(GR) as avg_gamma_ray,
        AVG(RHOB) as avg_density,
        COUNT(*) as samples
    FROM 'force_2020/wells/15-9-13.parquet'
    GROUP BY WELL
""").fetchall()

for row in result:
    print(row)

# Query multiple wells at once
multi_well = con.execute("""
    SELECT WELL, FORMATION, COUNT(*) as samples
    FROM 'force_2020/wells/*.parquet'
    WHERE FORMATION IS NOT NULL
    GROUP BY WELL, FORMATION
    ORDER BY WELL, samples DESC
""").fetchall()

Well Log Schema

Each well file contains 29 columns with petrophysical measurements and metadata.

Well Log Columns

29 columns per well file

WELL

DEPTH_MD

X_LOC

Y_LOC

Z_LOC

GROUP

FORMATION

dataset

CALI

RSHA

RMED

RDEP

RHOB

SGR

NPHI

PEF

DTC

ROP

DTS

DCAL

DRHO

MUDWEIGHT

RMIC

ROPA

RXO

FORCE_2020_LITHOFACIES_LITHOLOGY

Log curves: GR (Gamma Ray), RHOB (Bulk Density), NPHI (Neutron Porosity), DTC/DTS (Sonic), RDEP/RMED/RSHA (Resistivity), CALI (Caliper), PEF (Photoelectric Factor)