eegunity.modules.parser#

Submodules#

eegunity.modules.parser.eeg_parser module#

eegunity.modules.parser.eeg_parser.apply_dataset_kernel(udataset, raw_data, row)[source]#

Apply the dataset kernel to one loaded raw object.

Parameters:
  • udataset (object) – UnifiedDataset-like object exposing get_shared_attr().

  • raw_data (mne.io.BaseRaw) – Loaded raw object after locator-driven metadata patching.

  • row (pandas.Series) – Locator row corresponding to raw_data.

Returns:

Kernel-processed raw object, or the original object if no kernel is bound or if kernel execution fails.

Return type:

mne.io.BaseRaw

Examples

>>> # raw = apply_dataset_kernel(unified_dataset, raw, row)  
eegunity.modules.parser.eeg_parser.apply_dataset_kernel_unavailable(udataset, row)[source]#

Try to build a raw for an Unavailable file via the kernel’s load() method.

This is the entry point for the Option B extended kernel interface. For files that EEGUnity’s parser cannot handle (Completeness Check == 'Unavailable'), a kernel may opt in by setting HANDLES_UNAVAILABLE = True and implementing a load(row) method. If load() returns a non-None mne.io.BaseRaw, the normal apply_dataset_kernel() enrichment step is called on that raw before returning, giving the same two-phase (load → apply) pipeline as for Completed files.

Parameters:
  • udataset (object) – UnifiedDataset-like object exposing get_shared_attr().

  • row (pandas.Series) – Locator row for the Unavailable file.

Returns:

Fully processed raw (after load + apply), or None if the kernel does not support this file or load() returns None.

Return type:

mne.io.BaseRaw or None

eegunity.modules.parser.eeg_parser.normalize_data(raw_data, mean_std_str, norm_type)[source]#

Normalize EEG data based on provided mean and standard deviation values.

Parameters:
  • raw_data (mne.io.Raw) – The raw EEG data to be normalized. The data should be in MNE Raw format.

  • mean_std_str (Union[str, Dict]) – A dictionary or string that contains mean and standard deviation values. If it’s a string, it will be evaluated into a dictionary. The dictionary keys should be channel names (for channel-wise normalization) or ‘all_eeg’ (for sample-wise normalization).

  • norm_type (str) – The type of normalization to perform. It can be: - ‘channel-wise’: Normalize each channel individually based on its mean and standard deviation. - ‘sample-wise’: Normalize all channels based on a common mean and standard deviation.

Returns:

The normalized raw EEG data.

Return type:

mne.io.Raw

Raises:

ValueError – If norm_type is not ‘channel-wise’ or ‘sample-wise’.

eegunity.modules.parser.eeg_parser.set_montage_any(raw_data, verbose='CRITICAL')[source]#

Set the montage for the given raw data using a montage defined in a JSON file.

Parameters:
  • raw_data (mne.io.Raw) – The raw data object to which the montage will be applied.

  • verbose (str, optional) – The verbosity level for warnings or messages, by default ‘CRITICAL’.

Returns:

The updated raw data object with the applied montage.

Return type:

mne.io.Raw

eegunity.modules.parser.eeg_parser.create_montage_from_json(json_file)[source]#

Create a montage from a JSON file containing channel positions.

Parameters:

json_file (str) – The path to the JSON file containing channel names as keys and their positions as values.

Returns:

A montage object created from the channel positions defined in the JSON file.

Return type:

mne.channels.DigMontage

eegunity.modules.parser.eeg_parser.set_channel_type(raw_data, channel_str)[source]#

Apply locator channel schema to one raw object.

This function keeps EEGUnity’s locator-driven design intact: channel_str is treated as the source of truth, and raw metadata is overwritten according to the type:name pairs it contains.

Parameters:
  • raw_data (mne.io.Raw) – Raw data object to patch.

  • channel_str (str) – Comma-separated type:name string from locator.

Returns:

Patched raw object with renamed channels and updated channel kinds.

Return type:

mne.io.Raw

Raises:

ValueError – If any channel entry is not in type:name format or channel count mismatches.

Examples

>>> # raw = set_channel_type(raw, "eeg:Fz, stim:event_code")  
eegunity.modules.parser.eeg_parser.get_data_row(row, norm_type=None, is_set_channel_type=None, is_set_montage=False, pick_types_params=None, unit_convert=None, read_raw_params=None, handle_nonstandard_params=None, preload=True, use_locator_channel_metadata=True)[source]#

Process and return raw EEG data based on the input row information.

This function handles both standard and non-standard data, with options for setting channel types, montage, normalization, and unit conversion.

Parameters:
  • row (dict) – Dictionary containing data attributes, such as file paths, file types, and channel names.

  • norm_type (str, optional) – Type of normalization to apply, if any. Defaults to None.

  • is_set_channel_type (bool or None, optional) – Determines whether to set channel types based on the provided information. - If True, channel types will be set explicitly. - If None, the setting of channel types depends on whether the File Path in the locator follows the format “type:name” (see UnifiedDataset.EEGBatch.format_channel_names() for details). Defaults to None.

  • is_set_montage (bool, optional) – Whether to set montage (electrode coordinates). Defaults to False.

  • pick_types_params (dict, optional) – Dictionary specifying which channel types to include. The keys should match the parameters of raw.pick_types(). Defaults to None.

  • unit_convert (str, optional) – Conversion type for resetting channel units. Defaults to None.

  • read_raw_params (dict, optional) – Additional parameters to pass to mne.io.read_raw() for standard data loading.

  • handle_nonstandard_params (dict, optional) – Additional parameters to pass to handle_nonstandard_data() for non-standard data loading.

  • preload (bool, optional) – Whether to preload the data into memory. Defaults to True.

  • use_locator_channel_metadata (bool, optional) – Whether to use the locator’s Channel Names field to validate and rename channels immediately after loading the raw file. Defaults to True. Set to False when a dataset kernel is active and the locator may already describe the post-kernel Raw object rather than the on-disk file.

Returns:

The processed raw EEG data object.

Return type:

mne.io.BaseRaw

Raises:
  • ValueError – If the number of channels in the locator file does not match the metadata.

  • Warning – If pick_types is not None but is_set_channel_type is False, a warning will be issued to inform the user to set is_set_channel_type=True.

eegunity.modules.parser.eeg_parser.set_infer_unit(raw_data, row)[source]#

Set the inferred unit for EEG channels in the raw data.

Parameters:
  • raw_data (mne.io.Raw) – The raw data object containing the EEG channels.

  • row (pandas.Series) – A row from a DataFrame containing the ‘Infer Unit’ field, which should be a dictionary with channel names as keys and units as values.

Returns:

The updated raw data object with the inferred units set for the specified channels.

Return type:

mne.io.Raw

Raises:

ValueError – If ‘Infer Unit’ is not a valid dictionary.

eegunity.modules.parser.eeg_parser.channel_name_parser(input_string)[source]#

Format and standardize channel names into type:name entries.

The parser supports two paths:

  1. Explicit typed input (recommended): entries already in type:name form, including MNE channel types such as seeg:LA1, ecog:G1, stim:event_code and misc:rt. These are preserved (with light prefix canonicalization).

  2. Heuristic input: untyped names are classified by EEGUnity rules (EEG/EOG/EMG/ECG/STIM/Unknown).

Parameters:

input_string (str) – A comma-separated string containing channel names to be formatted.

Returns:

A comma-separated string of formatted channel names. If duplicates are found, the original input string is returned.

Return type:

str

Warning

Warns if an invalid channel name is detected or if a duplicate formatted channel name is found.

Examples

>>> channel_name_parser("Fz, Cz, Pz")
'eeg:Fz, eeg:Cz, eeg:Pz'
>>> channel_name_parser("seeg:LA1, seeg:LA2")
'seeg:LA1, seeg:LA2'
eegunity.modules.parser.eeg_parser.handle_nonstandard_data(row, verbose='CRITICAL', preload=True)[source]#

Load non-standard EEG files and return an mne.io.Raw object.

Supported paths include MATLAB .mat, HDF5 EEGLAB .set rows marked as eeglab_hdf5, CSV/TXT rows marked as csvData, WFDB .hea rows marked as wfdbData, and EDF content saved as .rec.

Parameters:
  • row (pandas.Series) – Locator row containing at least File Path, Channel Names, Sampling Rate, and File Type.

  • verbose (str, optional) – MNE verbosity level. Defaults to 'CRITICAL'.

  • preload (bool, optional) – Whether to preload HDF5 EEGLAB data when reading eeglab_hdf5 rows.

Returns:

Parsed raw object.

Return type:

mne.io.BaseRaw

Examples

>>> raw = handle_nonstandard_data(locator_row, preload=False)  
eegunity.modules.parser.eeg_parser.extract_events(raw, event_source='auto', stim_channel=None, regexp='^(?![Bb][Aa][Dd]|[Ee][Dd][Gg][Ee]).*$')[source]#

Extract events from annotations and/or stim channels.

Parameters:
  • raw (mne.io.Raw) – The raw data object.

  • event_source ({'auto', 'annotations', 'stim'}, optional) –

    Event source strategy:

    • 'annotations': use mne.events_from_annotations() only.

    • 'stim': use mne.find_events() only.

    • 'auto' (default): annotations first, then stim fallback.

  • stim_channel (str | list[str] | None, optional) – Stim channel name(s) passed to mne.find_events(). If None, all channels with MNE type stim are used automatically.

  • regexp (str, optional) – Regular expression passed to mne.events_from_annotations(). Defaults to MNE’s standard pattern that skips bad/edge labels.

Returns:

  • events (numpy.ndarray) – Events array shaped (n_events, 3).

  • event_id (dict) – Event-id mapping dictionary.

Raises:

ValueError – If event_source is not one of the accepted values.

Examples

>>> events, event_id = extract_events(raw, event_source='auto')  
>>> events, event_id = extract_events(raw, event_source='stim', stim_channel=['TRIG'])  
eegunity.modules.parser.eeg_parser.infer_channel_unit(ch_name, ch_data, ch_type)[source]#

Infer the unit type for a given channel based on its data and type.

Parameters:
  • ch_name (str) – The name of the channel.

  • ch_data (array-like) – The data of the channel, typically an array of amplitude values.

  • ch_type (str) – The type of the channel, such as ‘eeg’, ‘emg’, etc.

Returns:

The inferred unit type, such as “uV”, “mV”, or “V”, based on the channel data and type.

Return type:

str

eegunity.modules.parser.eeg_parser.convert_unit(data, unit)[source]#

Convert the units of EEG data in a MNE Raw object.

Parameters:
  • data (mne.io.Raw) – The raw EEG data to be converted.

  • unit (str) – The target unit to convert the data to. Must be one of ‘V’, ‘mV’, or ‘uV’.

Raises:

ValueError – If the provided unit is not valid.

Returns:

The raw EEG data with converted units.

Return type:

mne.io.Raw

eegunity.modules.parser.eeg_parser.process_brainvision_files(files_locator, verbose, num_workers=0)[source]#

Retry failed BrainVision .vhdr files by patching internal sidecar paths.

Targets .vhdr files that failed MNE reading because internal DataFile= or MarkerFile= paths reference pre-BIDS filenames that no longer exist.

Parameters:
  • files_locator (pandas.DataFrame) – Locator DataFrame; must already contain an ‘Error’ column.

  • verbose (str) – MNE verbosity level.

  • num_workers (int, optional) – Number of parallel worker threads (0 = sequential).

Returns:

Updated DataFrame with metadata filled for successfully re-read files.

Return type:

pandas.DataFrame

Examples

>>> process_brainvision_files(locator_df, "CRITICAL", num_workers=2)  
eegunity.modules.parser.eeg_parser.process_mne_files(files_locator, verbose, num_workers=0)[source]#

Process MNE files based on a locator DataFrame.

Parameters:
  • files_locator (pandas.DataFrame) – DataFrame containing file paths and related metadata for processing.

  • verbose (str) – Verbosity level for MNE functions.

  • num_workers (int, optional) – Number of worker threads for parallel processing (default is 0, sequential).

Returns:

Updated DataFrame with metadata extracted from processed files.

Return type:

pandas.DataFrame

Examples

>>> process_mne_files(locator_df, verbose="CRITICAL", num_workers=0)  

eegunity.modules.parser.eeg_parser_config module#

eegunity.modules.parser.eeg_parser_csv module#

eegunity.modules.parser.eeg_parser_csv.calculate_interval(times)[source]#

Calculate the average interval between time points.

Parameters:

times (pandas.Series) – A pandas Series object containing time points. The time points can either be timezone-aware DatetimeTZDtype or naive pd.Timestamp objects.

Returns:

The average interval between consecutive time points in seconds. If the input series is empty or only has one time point, returns None.

Return type:

float or None

eegunity.modules.parser.eeg_parser_csv.is_datetime_format(s)[source]#

Check if a string follows a datetime format.

Parameters:

s (str) – The string to be evaluated for compatibility with the datetime format.

Returns:

Returns True if the string matches the datetime format “%Y-%m-%d %H:%M:%S.%f”. Otherwise, returns `False.

Return type:

bool

eegunity.modules.parser.eeg_parser_csv.identify_time_columns(df)[source]#

Identify potential time columns in a DataFrame.

Parameters:

df (pandas.DataFrame) – The input DataFrame containing potential time columns.

Returns:

If a single time column is identified, returns the column name and its sampling frequency as a float. If multiple time columns are found with the same sampling frequency, returns a list of column names and the common sampling frequency. Returns None if no valid time column is detected.

Return type:

str or list of str, float

eegunity.modules.parser.eeg_parser_csv.process_csv_files(files_locator, num_workers=0, min_file_size=5242880)[source]#

Process CSV files and update a DataFrame with file details.

Parameters:
  • files_locator (pandas.DataFrame) – A DataFrame containing the metadata of files, including their file paths and other details. The column ‘File Path’ is expected to contain paths to the files.

  • num_workers (int, optional) – Number of worker threads for parallel processing (default is 0, sequential).

  • min_file_size (int, optional) – Minimum file size in bytes for a CSV/TXT file to be processed (default is 5 MB). Files smaller than this threshold are skipped.

Returns:

Updated DataFrame with additional columns ‘File Type’, ‘Sampling Rate’, ‘Channel Names’, ‘Number of Channels’, and ‘Duration’ for each file. If a file cannot be processed, appropriate messages are printed.

Return type:

pandas.DataFrame

Examples

>>> import pandas as pd
>>> locator = pd.DataFrame([{"File Path": "sample.csv", "File Type": "unknown"}])
>>> process_csv_files(locator, num_workers=0, min_file_size=0)  

eegunity.modules.parser.eeg_parser_mat module#

eegunity.modules.parser.eeg_parser_mat.process_hdf5_set_files(files_locator, num_workers=0)[source]#

Process EEGLAB .set files saved in HDF5 (MATLAB v7.3) format.

Targets .set files that failed MNE reading with a “HDF reader” error. Extracts metadata (channels, srate, duration) via h5py without loading raw signal data.

Parameters:
  • files_locator (pandas.DataFrame) – Locator DataFrame; must already contain an ‘Error’ column populated by process_mne_files().

  • num_workers (int, optional) – Number of parallel worker threads (0 = sequential).

Returns:

Updated DataFrame with metadata filled for readable HDF5 .set files.

Return type:

pandas.DataFrame

Examples

>>> process_hdf5_set_files(locator_df, num_workers=2)  
eegunity.modules.parser.eeg_parser_mat.read_eeglab_hdf5(filepath, preload=True, verbose='CRITICAL')[source]#

Read a HDF5-format EEGLAB .set file into an MNE RawArray.

Used by handle_nonstandard_data() when file_type == ‘eeglab_hdf5’. When preload=False a zero-filled array is returned (metadata + annotations only), which is sufficient for kernels that only need sidecar files and raw.annotations.

Parameters:
  • filepath (str) – Path to the HDF5 .set file.

  • preload (bool, optional) – If True, load the full EEG signal. If False, return a stub RawArray with annotations only (faster, suitable for metadata-only kernels).

  • verbose (str, optional) – MNE verbosity level.

Returns:

MNE Raw object with channel info and (when available) annotations.

Return type:

mne.io.RawArray

Examples

>>> raw = read_eeglab_hdf5("sample.set", preload=False)  
eegunity.modules.parser.eeg_parser_mat.process_mat_files(files_locator, num_workers=0)[source]#

Process MAT files and update a DataFrame with file details.

Parameters:
  • files_locator (pandas.DataFrame) – A DataFrame containing the metadata of files, including their file paths and other details. The column ‘File Path’ is expected to contain paths to the MAT files.

  • num_workers (int, optional) – Number of worker threads for parallel processing (default is 0, sequential).

Returns:

Updated DataFrame with additional columns ‘File Type’, ‘Sampling Rate’, ‘Channel Names’, ‘Number of Channels’, and ‘Duration’ for each file. If a file cannot be processed, appropriate messages are printed.

Return type:

pandas.DataFrame

Raises:
  • FileNotFoundError – If the MAT file cannot be located.

  • Exception – General exception for unexpected errors during file processing.

Examples

>>> process_mat_files(locator_df, num_workers=0)  

eegunity.modules.parser.eeg_parser_wfdb module#

eegunity.modules.parser.eeg_parser_wfdb.process_wfdb_files(files_locator, num_workers=0)[source]#

Process WFDB header files and update a DataFrame with file details.

Parameters:
  • files_locator (pandas.DataFrame) – A DataFrame containing the metadata of files, including their file paths and other details. The column ‘File Path’ is expected to contain paths to the files. Only rows with ‘File Type’ equal to ‘unknown’ are processed.

  • num_workers (int, optional) – Number of worker threads for parallel processing (default is 0, sequential).

Returns:

Updated DataFrame with additional columns ‘File Type’, ‘Sampling Rate’, ‘Channel Names’, ‘Number of Channels’, ‘Data Shape’, and ‘Duration’ for each eligible WFDB file. Files without a companion .dat file or that cannot be parsed are left unchanged.

Return type:

pandas.DataFrame

Examples

>>> process_wfdb_files(locator_df, num_workers=2)  

Package exports#