eegunity.modules.parser#

Submodules#

eegunity.modules.parser.eeg_parser module#

class eegunity.modules.parser.eeg_parser.EEGParser(main_instance)[source]#

Bases: _UDatasetSharedAttributes

check_locator(locator)[source]#

Validate the contents of the locator DataFrame.

Parameters:

locator (pd.DataFrame) – A DataFrame containing file metadata, including data shape, channel names, file type, file path, number of channels, sampling rate, and duration.

Returns:

The updated DataFrame with a ‘Completeness Check’ column indicating whether the validation was completed or if errors were found.

Return type:

pd.DataFrame

get_data(data_idx, **kwargs)[source]#

Retrieve data based on the specified index from the locator.

Parameters:

data_idx (int) – Index of the row in the locator DataFrame to retrieve data from.

Returns:

The data retrieved and processed according to the specified parameters.

Return type:

Any

eegunity.modules.parser.eeg_parser.channel_name_parser(input_string)[source]#

Format and standardize a list of channel names based on predefined rules.

Parameters:

input_string (str) – A comma-separated string containing channel names to be formatted.

Returns:

A comma-separated string of formatted channel names. If duplicates are found, the original input string is returned.

Return type:

str

Warning

Warns if an invalid channel name is detected or if a duplicate formatted channel name is found.

eegunity.modules.parser.eeg_parser.convert_unit(data: Raw, unit: str) Raw[source]#

Convert the units of EEG data in a MNE Raw object.

Parameters:
  • data (mne.io.Raw) – The raw EEG data to be converted.

  • unit (str) – The target unit to convert the data to. Must be one of ‘V’, ‘mV’, or ‘uV’.

Raises:

ValueError – If the provided unit is not valid.

Returns:

The raw EEG data with converted units.

Return type:

mne.io.Raw

eegunity.modules.parser.eeg_parser.create_montage_from_json(json_file)[source]#

Create a montage from a JSON file containing channel positions.

Parameters:

json_file (str) – The path to the JSON file containing channel names as keys and their positions as values.

Returns:

A montage object created from the channel positions defined in the JSON file.

Return type:

mne.channels.DigMontage

eegunity.modules.parser.eeg_parser.extract_events(raw)[source]#

Extract events from an mne.io.Raw object.

Attempt to extract events using mne.events_from_annotations. If it fails, use mne.find_events to extract events without description.

Parameters:

raw (mne.io.Raw) – The raw data object.

Returns:

  • events (numpy.ndarray) – The events array, shaped (n_events, 3).

  • event_id (dict) – Dictionary of event IDs.

eegunity.modules.parser.eeg_parser.get_data_row(row: dict, norm_type: str | None = None, is_set_channel_type: bool | None = None, is_set_montage: bool = False, pick_types_params: dict | None = None, unit_convert: str | None = None, read_raw_params: dict | None = None, handle_nonstandard_params: dict | None = None, preload: bool = True) BaseRaw[source]#

Process and return raw EEG data based on the input row information.

This function handles both standard and non-standard data, with options for setting channel types, montage, normalization, and unit conversion.

Parameters:
  • row (dict) – Dictionary containing data attributes, such as file paths, file types, and channel names.

  • norm_type (str, optional) – Type of normalization to apply, if any. Defaults to None.

  • is_set_channel_type (bool or None, optional) – Determines whether to set channel types based on the provided information. - If True, channel types will be set explicitly. - If None, the setting of channel types depends on whether the File Path in the locator follows the format “type:name” (see UnifiedDataset.EEGBatch.format_channel_names() for details). Defaults to None.

  • is_set_montage (bool, optional) – Whether to set montage (electrode coordinates). Defaults to False.

  • pick_types_params (dict, optional) – Dictionary specifying which channel types to include. The keys should match the parameters of raw.pick_types(). Defaults to None.

  • unit_convert (str, optional) – Conversion type for resetting channel units. Defaults to None.

  • read_raw_params (dict, optional) – Additional parameters to pass to mne.io.read_raw() for standard data loading.

  • handle_nonstandard_params (dict, optional) – Additional parameters to pass to handle_nonstandard_data() for non-standard data loading.

  • preload (bool, optional) – Whether to preload the data into memory. Defaults to True.

Returns:

The processed raw EEG data object.

Return type:

mne.io.BaseRaw

Raises:
  • ValueError – If the number of channels in the locator file does not match the metadata.

  • Warning – If pick_types is not None but is_set_channel_type is False, a warning will be issued to inform the user to set is_set_channel_type=True.

eegunity.modules.parser.eeg_parser.handle_nonstandard_data(row, verbose='CRITICAL')[source]#

Handles the loading of non-standard EEG data files into MNE Raw format.

This function processes EEG data from either .mat files or .csv/.txt files marked as ‘csvData’. It extracts channel names and sampling rates from the provided row, and creates an MNE RawArray object containing the EEG data.

Parameters:
  • row (pd.Series) – A row from a DataFrame containing information about the file, including ‘File Path’, ‘Channel Names’, ‘Sampling Rate’, and ‘File Type’.

  • verbose (str, optional) – The verbosity level for MNE functions. Default is ‘CRITICAL’.

Returns:

An MNE Raw object containing the EEG data.

Return type:

mne.io.Raw

Raises:
  • ValueError – If the number of channels in the DataFrame does not match the channel names specified in the row.

  • Exception – If the file type is unsupported or there is an error in loading the data.

eegunity.modules.parser.eeg_parser.infer_channel_unit(ch_name, ch_data, ch_type)[source]#

Infer the unit type for a given channel based on its data and type.

Parameters:
  • ch_name (str) – The name of the channel.

  • ch_data (array-like) – The data of the channel, typically an array of amplitude values.

  • ch_type (str) – The type of the channel, such as ‘eeg’, ‘emg’, etc.

Returns:

The inferred unit type, such as “uV”, “mV”, or “V”, based on the channel data and type.

Return type:

str

eegunity.modules.parser.eeg_parser.normalize_data(raw_data, mean_std_str: str | Dict, norm_type: str)[source]#

Normalize EEG data based on provided mean and standard deviation values.

Parameters:
  • raw_data (mne.io.Raw) – The raw EEG data to be normalized. The data should be in MNE Raw format.

  • mean_std_str (Union[str, Dict]) – A dictionary or string that contains mean and standard deviation values. If it’s a string, it will be evaluated into a dictionary. The dictionary keys should be channel names (for channel-wise normalization) or ‘all_eeg’ (for sample-wise normalization).

  • norm_type (str) – The type of normalization to perform. It can be: - ‘channel-wise’: Normalize each channel individually based on its mean and standard deviation. - ‘sample-wise’: Normalize all channels based on a common mean and standard deviation.

Returns:

The normalized raw EEG data.

Return type:

mne.io.Raw

Raises:

ValueError – If norm_type is not ‘channel-wise’ or ‘sample-wise’.

eegunity.modules.parser.eeg_parser.process_mne_files(files_locator, verbose, num_workers=0)[source]#

Process MNE files based on a locator DataFrame.

Parameters:
  • files_locator (pandas.DataFrame) – DataFrame containing file paths and related metadata for processing.

  • verbose (str) – Verbosity level for MNE functions.

  • num_workers (int, optional) – Number of worker threads for parallel processing (default is 0, sequential).

Returns:

Updated DataFrame with metadata extracted from processed files.

Return type:

pandas.DataFrame

eegunity.modules.parser.eeg_parser.set_channel_type(raw_data, channel_str)[source]#

Set the channel types for the given raw data based on the specified channel string.

Parameters:
  • raw_data (mne.io.Raw) – The raw data object containing the EEG, EMG, ECG, EOG, or other types of signals.

  • channel_str (str) – A string specifying the channel types and names in the format ‘type:name’, separated by commas. Each type must correspond to the desired signal type.

Returns:

The updated raw data object with renamed channels and set channel types.

Return type:

mne.io.Raw

Raises:

ValueError – If the format of any channel in the channel string is invalid (not in ‘type:name’ format).

eegunity.modules.parser.eeg_parser.set_infer_unit(raw_data, row)[source]#

Set the inferred unit for EEG channels in the raw data.

Parameters:
  • raw_data (mne.io.Raw) – The raw data object containing the EEG channels.

  • row (pandas.Series) – A row from a DataFrame containing the ‘Infer Unit’ field, which should be a dictionary with channel names as keys and units as values.

Returns:

The updated raw data object with the inferred units set for the specified channels.

Return type:

mne.io.Raw

Raises:

ValueError – If ‘Infer Unit’ is not a valid dictionary.

eegunity.modules.parser.eeg_parser.set_montage_any(raw_data: Raw, verbose='CRITICAL')[source]#

Set the montage for the given raw data using a montage defined in a JSON file.

Parameters:
  • raw_data (mne.io.Raw) – The raw data object to which the montage will be applied.

  • verbose (str, optional) – The verbosity level for warnings or messages, by default ‘CRITICAL’.

Returns:

The updated raw data object with the applied montage.

Return type:

mne.io.Raw

eegunity.modules.parser.eeg_parser_config module#

eegunity.modules.parser.eeg_parser_csv module#

eegunity.modules.parser.eeg_parser_csv.calculate_interval(times)[source]#

Calculate the average interval between time points.

Parameters:

times (pandas.Series) – A pandas Series object containing time points. The time points can either be timezone-aware DatetimeTZDtype or naive pd.Timestamp objects.

Returns:

The average interval between consecutive time points in seconds. If the input series is empty or only has one time point, returns None.

Return type:

float or None

eegunity.modules.parser.eeg_parser_csv.identify_time_columns(df)[source]#

Identify potential time columns in a DataFrame.

Parameters:

df (pandas.DataFrame) – The input DataFrame containing potential time columns.

Returns:

If a single time column is identified, returns the column name and its sampling frequency as a float. If multiple time columns are found with the same sampling frequency, returns a list of column names and the common sampling frequency. Returns None if no valid time column is detected.

Return type:

str or list of str, float

eegunity.modules.parser.eeg_parser_csv.is_datetime_format(s)[source]#

Check if a string follows a datetime format.

Parameters:

s (str) – The string to be evaluated for compatibility with the datetime format.

Returns:

Returns True if the string matches the datetime format “%Y-%m-%d %H:%M:%S.%f”. Otherwise, returns `False.

Return type:

bool

eegunity.modules.parser.eeg_parser_csv.process_csv_files(files_locator, num_workers=0)[source]#

Process CSV files and update a DataFrame with file details.

Parameters:
  • files_locator (pandas.DataFrame) – A DataFrame containing the metadata of files, including their file paths and other details. The column ‘File Path’ is expected to contain paths to the files.

  • num_workers (int, optional) – Number of worker threads for parallel processing (default is 0, sequential).

Returns:

Updated DataFrame with additional columns ‘File Type’, ‘Sampling Rate’, ‘Channel Names’, ‘Number of Channels’, and ‘Duration’ for each file. If a file cannot be processed, appropriate messages are printed.

Return type:

pandas.DataFrame

eegunity.modules.parser.eeg_parser_mat module#

eegunity.modules.parser.eeg_parser_mat.process_mat_files(files_locator, num_workers=0)[source]#

Process MAT files and update a DataFrame with file details.

Parameters:
  • files_locator (pandas.DataFrame) – A DataFrame containing the metadata of files, including their file paths and other details. The column ‘File Path’ is expected to contain paths to the MAT files.

  • num_workers (int, optional) – Number of worker threads for parallel processing (default is 0, sequential).

Returns:

Updated DataFrame with additional columns ‘File Type’, ‘Sampling Rate’, ‘Channel Names’, ‘Number of Channels’, and ‘Duration’ for each file. If a file cannot be processed, appropriate messages are printed.

Return type:

pandas.DataFrame

Raises:
  • FileNotFoundError – If the MAT file cannot be located.

  • Exception – General exception for unexpected errors during file processing.

Module contents#