Datalair Documentation
Datalair is a lightweight Python library designed to streamline the management, processing, and retrieval of data for research or projects. It provides utilities for efficient downloading, handling, and storage of datasets.
Features
- Data Downloading:
- Utilities for downloading files or datasets from popular repositories such as GEO and ArrayExpress.
-
Robust handling for managing large or supplementary files with easy integration into workflows.
-
Developer-Friendly:
- Type safety and linting support for robust development with tools like
mypyandruff. - Built-in testing capabilities powered by
pytest.
Installation
Ensure you have Python 3.12 or newer before installing.
```shell script poetry install https://github.com/LuJoHae/datalair.git
---
## Quickstart
Here’s a quick example of how to use Datalair to download a dataset:
```python
from datalair import Lair, Dataset
lair = Lair("./lair")
lair.create_if_not_exist()
lair.assert_ok_satus()
class MyDataset(Dataset):
def derive(self, lair: "Lair") -> None:
output_dir = lair.get_path(self)
with open(output_dir.joinpath("myfile.txt"), "w") as f:
f.write("Hello World")
dataset = MyDataset()
lair.safe_derive(dataset)
filepaths = lair.get_dataset_filepaths(dataset)
Check the module-specific functions like download_supplementary_from_geo or
download_files_from_arrayexpress for dataset-related tasks.