Parsing Non-standard Data Formats in EEGUnity#

This tutorial covers non-standard file types supported by EEGUnity parser extensions.

Supported Formats#

EEGUnity can parse these non-standard sources during dataset scanning:

MATLAB files: .mat
HDF5 EEGLAB files: .set (stored as MATLAB v7.3/HDF5)
CSV or TXT time-series tables: .csv, .txt
WFDB records: .hea + .dat
EDF content with non-standard extension: .rec
BrainVision .vhdr with broken internal sidecar references (automatic patch fallback)

Step 1: Build Locator with Parser Extensions Enabled#

from eegunity import UnifiedDataset

ud = UnifiedDataset(
    dataset_path=r"path/to/dataset",
    domain_tag="my_dataset",
    num_workers=8,
    min_file_size=0,  # include small CSV/TXT files
)

locator = ud.get_locator()
print(locator[["File Path", "File Type", "Completeness Check"]].head())
print(locator["File Type"].value_counts(dropna=False))

Step 2: Inspect Specific File Types#

wfdb_rows = locator[locator["File Type"] == "wfdbData"]
csv_rows = locator[locator["File Type"] == "csvData"]
hdf5_set_rows = locator[locator["File Type"] == "eeglab_hdf5"]

print("WFDB rows:", len(wfdb_rows))
print("CSV/TXT rows:", len(csv_rows))
print("HDF5 .set rows:", len(hdf5_set_rows))

Step 3: Load a Non-standard Row with `get_data_row`#

from eegunity import get_data_row

# Example: read the first available WFDB row
row = wfdb_rows.iloc[0]
raw = get_data_row(row, preload=False)

print("Channels:", raw.info["nchan"])
print("Sampling rate:", raw.info["sfreq"])

The same get_data_row API works for .mat, csvData, eeglab_hdf5, and .rec rows.

Step 4: Batch Validate Readability#

def can_read(row):
    try:
        _ = get_data_row(row, preload=False)
        return "ok"
    except Exception as exc:
        return f"error: {type(exc).__name__}"

status = ud.eeg_batch.batch_process(
    con_func=lambda row: row["Completeness Check"] != "Unavailable",
    app_func=can_read,
    is_patch=True,
    result_type="value",
    execution_mode="thread",
)

ud.eeg_batch.set_metadata("Read Check", status)
print(ud.get_locator()[["File Path", "File Type", "Read Check"]].head())

Notes#

For WFDB parsing, install wfdb.
For HDF5 .set, install h5py.
min_file_size mainly affects CSV/TXT scanning; set it to 0 when testing small demo files.
BrainVision .vhdr sidecar mismatch is retried automatically with patched temporary headers.