Skip to content

duplicate bad entry in the bgc-s index #25

@eeholmes

Description

@eeholmes

I am not sure where to report this and it is a minor issue.

For wmo = 4902627, cyc = 18, the bgc-s index https://data-argo.ifremer.fr/argo_synthetic-profile_index.txt.gz file has a duplicate extra file row with bad data:

                                       file                date  latitude  \
30   meds/4902627/profiles/SR4902627_018.nc 2024-03-04 17:25:00    42.484   
31  meds/4902627/profiles/SR4902627_018D.nc 2024-02-23 23:52:00       NaN   

    longitude ocean  profiler_type institution  \
30    -60.269     A            836          ME   
31        NaN   NaN            836          ME   

                                          parameters parameter_data_mode  \
30  PRES PSAL TEMP DOXY CHLA BBP700 PH_IN_SITU_TOTAL             RRRRARR   
31              PRES PSAL TEMP DOXY PH_IN_SITU_TOTAL               RRRRR   

           date_update      wmo  cyc  
30 2025-08-27 21:56:11  4902627   18  
31 2025-08-27 21:56:00  4902627   18  

When using argopy (and user mode is standard), this WMO is filtered out because it chooses the 2nd (bad) row and then CHLA_DATA_MODE becomes "".

I wasn't able to test the whole index for other duplicates. Perhaps there are others.

MCVE Code Sample

import numpy as np
from argopy import DataFetcher as ArgoDataFetcher

wmo = 4902627
cyc = 18
region = [-70, -40, 20, 60, 0, 1000, "2024-03-01", "2024-04-01"]

# 1) Fetch in expert mode (so nothing gets dropped by standard-mode filtering)
fetcher = ArgoDataFetcher(
    mode="expert",
    ds="bgc",
    src="erddap",
    params=["CHLA", "PRES"],
).region(region)

# 3) Show what the Argo synthetic-profile index says for the same float/cycle
idx = fetcher.fetcher.indexfs
idx.query.wmo([wmo])
idf = idx.to_dataframe(completed=False)
idf[(idf["file"] == target_file) | ((idf["wmo"] == wmo) & (idf["cyc"] == cyc))]

shows output above

Metadata

Metadata

Assignees

No one assigned

    Labels

    CHLAAbout Chlorophyl-A sensor or measurementargo-BGCAbout biogeochemical variables

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions