How to Speed Up EEGUnity with Built-in Multithreading#
1. Introduction#
EEGUnity (>0.6.0) now provides built-in multithreading through the num_workers
parameter in UnifiedDataset.
The new version integrates multithreading directly into the core pipeline. This ensures:
Cleaner user code
Safer concurrency management
Better performance scaling
No nested or duplicated thread pools
The design philosophy is similar to PyTorch’s
DataLoader(num_workers=...).
2. Basic Usage#
To enable multithreading, simply set num_workers when creating a
UnifiedDataset.
from eegunity import UnifiedDataset
u_dataset = UnifiedDataset(
dataset_path="your_dataset_root",
domain_tag="your_domain_tag",
num_workers=8 # number of threads
)
If num_workers=0 (default), EEGUnity runs sequentially.
If num_workers>0, EEGUnity internally uses a thread pool to
parallelize supported operations.
No additional concurrency code is required.
3. Where Multithreading Is Applied#
Multithreading is automatically applied in the following stages:
3.1 Dataset Scanning (Parser Stage)#
When dataset_path is provided, EEGUnity scans directories and builds
the locator.
File parsing (e.g., .fif, .mat, .csv) is parallelized using
num_workers.
This significantly accelerates large dataset initialization.
3.2 Batch Processing (EEGBatch Stage)#
Functions that rely on batch_process() automatically inherit
multithreading, including:
export_h5Dataset()save_as_other()process_mean_std()format_channel_names()
Each row in the locator can be processed in parallel.
4. Internal Execution Model#
EEGUnity uses a ThreadPoolExecutor internally when num_workers > 0.
However, not all steps can be parallelized safely.
Some operations must remain sequential, such as:
Writing to the same output file
Maintaining deterministic order
Updating shared state
EEGUnity solves this by:
Parallelizing independent row-level tasks
Keeping order-sensitive operations outside thread pools
Collecting results before final writing steps
This hybrid design ensures correctness while maximizing throughput.
5. Choosing num_workers#
There is no single optimal value. It depends on:
CPU core count
Dataset size
I/O speed (SSD vs HDD)
Task complexity
General Recommendations#
Start with
num_workers = number_of_CPU_coresFor I/O-heavy workloads, slightly higher values may help
For CPU-heavy signal processing, stay near core count
For small datasets, parallelism may not provide noticeable benefit
Example:
import os
u_dataset = UnifiedDataset(
dataset_path="your_dataset_root",
domain_tag="your_domain_tag",
num_workers=os.cpu_count()
)
Always benchmark on your own system.
6. Practical Example#
from eegunity import UnifiedDataset
u_dataset = UnifiedDataset(
dataset_path="your_dataset_root",
domain_tag="your_domain_tag",
num_workers=8
)
# Export to HDF5 in parallel
u_dataset.eeg_batch.export_h5Dataset("output_path")
This will:
Parse dataset files in parallel
Process each locator row concurrently
Safely write results in correct order
7. Important Notes#
Do not manually create external thread pools.
Avoid nesting additional concurrency layers.
Ensure sufficient memory is available when increasing
num_workers.If debugging, temporarily set
num_workers=0for deterministic behavior.
8. Summary#
The new built-in multithreading system:
Simplifies user code
Improves performance for large datasets
Ensures safe parallel execution
Requires only one parameter:
num_workers
By delegating concurrency management to EEGUnity, users can focus on dataset processing logic instead of thread orchestration.