Parsing Non-standard Data Formats in EEGUnity#
This tutorial covers non-standard file types supported by EEGUnity parser extensions.
Supported Formats#
EEGUnity can parse these non-standard sources during dataset scanning:
MATLAB files:
.matHDF5 EEGLAB files:
.set(stored as MATLAB v7.3/HDF5)CSV or TXT time-series tables:
.csv,.txtWFDB records:
.hea+.datEDF content with non-standard extension:
.recBrainVision
.vhdrwith broken internal sidecar references (automatic patch fallback)
Step 1: Build Locator with Parser Extensions Enabled#
from eegunity import UnifiedDataset
ud = UnifiedDataset(
dataset_path=r"path/to/dataset",
domain_tag="my_dataset",
num_workers=8,
min_file_size=0, # include small CSV/TXT files
)
locator = ud.get_locator()
print(locator[["File Path", "File Type", "Completeness Check"]].head())
print(locator["File Type"].value_counts(dropna=False))
Step 2: Inspect Specific File Types#
wfdb_rows = locator[locator["File Type"] == "wfdbData"]
csv_rows = locator[locator["File Type"] == "csvData"]
hdf5_set_rows = locator[locator["File Type"] == "eeglab_hdf5"]
print("WFDB rows:", len(wfdb_rows))
print("CSV/TXT rows:", len(csv_rows))
print("HDF5 .set rows:", len(hdf5_set_rows))
Step 3: Load a Non-standard Row with get_data_row#
from eegunity import get_data_row
# Example: read the first available WFDB row
row = wfdb_rows.iloc[0]
raw = get_data_row(row, preload=False)
print("Channels:", raw.info["nchan"])
print("Sampling rate:", raw.info["sfreq"])
The same get_data_row API works for .mat, csvData, eeglab_hdf5, and .rec rows.
Step 4: Batch Validate Readability#
def can_read(row):
try:
_ = get_data_row(row, preload=False)
return "ok"
except Exception as exc:
return f"error: {type(exc).__name__}"
status = ud.eeg_batch.batch_process(
con_func=lambda row: row["Completeness Check"] != "Unavailable",
app_func=can_read,
is_patch=True,
result_type="value",
execution_mode="thread",
)
ud.eeg_batch.set_metadata("Read Check", status)
print(ud.get_locator()[["File Path", "File Type", "Read Check"]].head())
Notes#
For WFDB parsing, install
wfdb.For HDF5
.set, installh5py.min_file_sizemainly affects CSV/TXT scanning; set it to0when testing small demo files.BrainVision
.vhdrsidecar mismatch is retried automatically with patched temporary headers.