Skip to content

Lair Class API

Lair

This class represents a data storage system called "Lair" for managing structured datasets. It provides functionality to check, create, delete, and manage datasets, as well as maintain metadata and configurations. Additionally, it supports operations to ensure data integrity and applicable directory permissions.

Detailed description of the class, its purpose, and usage.

Attributes:

Name Type Description
_path Path

Absolute path to the main storage directory for the lair.

_archive_path Optional[Path]

Absolute path to an optional archive directory for storing supplementary data.

__enter__

__enter__() -> Self

Manages the resource context and ensures the resource is in an acceptable state upon entering the context. This method should be used with a context manager to guarantee that the resource status is validated before it is used.

Returns:

Name Type Description
Self Self

An instance of the class this method belongs to, ensuring the

Self

context is correctly initialized for usage.

__exit__

__exit__(exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException], exc_tb: Optional[TracebackType]) -> Literal[False]

Handles the context manager exit by explicitly returning False to indicate that exceptions should not be suppressed during context management.

Parameters:

Name Type Description Default
exc_type Optional[Type[BaseException]]

The class of the exception that occurred, if any; otherwise, None.

required
exc_val Optional[BaseException]

The exception instance that occurred, if any; otherwise, None.

required
exc_tb Optional[TracebackType]

The traceback object associated with the exception, if any; otherwise, None.

required

Returns:

Type Description
Literal[False]

Literal[False]: Always returns False, which specifies that exceptions will not

Literal[False]

be suppressed and will propagate normally.

__init__

__init__(path: Path | str = Path('./lair'), archive_path: Optional[Path | str] = None)

Initializes an instance of the class with specified paths.

The constructor sets up the main path and optional archive path, converting them to absolute and resolved Path objects. If the archive_path is not provided, it remains as None.

Parameters:

Name Type Description Default
path Path | str

The main path, which is used and stored as an absolute, resolved Path object. Defaults to './lair'.

Path('./lair')
archive_path Optional[Path | str]

The optional archive path, which, if provided, is stored as an absolute, resolved Path object. Defaults to None.

None

__str__

__str__() -> str

Returns a string representation of the object.

The method provides a meaningful string that represents the object. This can be used for debugging or display purposes. The returned string includes the object's specific state or data.

Returns:

Name Type Description
str str

A string representation of the object.

assert_dataset_exists

assert_dataset_exists(dataset: Dataset) -> None

Asserts that a specified dataset exists in the current context.

This method verifies the existence of the provided dataset by checking internal conditions or performing necessary validations. It ensures the dataset is present and accessible, raising an assertion error if the dataset is not found.

Parameters:

Name Type Description Default
dataset Dataset

The dataset object to validate for existence.

required

Raises:

Type Description
AssertionError

If the specified dataset does not exist.

assert_dataset_missing

assert_dataset_missing(dataset: Dataset) -> None

Asserts that the specified dataset does not exist. If the dataset exists, an assertion error is raised with an appropriate error message. This method ensures its operation by first checking the system status is valid.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to verify for non-existence.

required

assert_ok_satus

assert_ok_satus() -> None

Asserts that the current status of the store is LairStatus.OK.

This method verifies the status of the store object to ensure it is in a valid and acceptable state (LairStatus.OK). If the status does not match, an assertion error is raised with a detailed error message containing the store information and its current status.

Raises:

Type Description
AssertionError

If the store status is not LairStatus.OK. The error message

check_store_permissions

check_store_permissions() -> bool

Checks if the store permissions are valid.

This method evaluates the permissions of a file or directory at the path returned by get_path(). It ensures the permissions fall within the range of valid store permissions.

Returns:

Name Type Description
bool bool

True if the permissions are valid, False otherwise.

create

create() -> None

Creates a new store directory and its configuration file.

This method is responsible for creating the necessary directory structure and associated configuration file for a store. It ensures the store does not already exist and creates the directory at the specified path. The configuration file is created within the newly created directory.

Raises:

Type Description
FileExistsError

If the directory for the store already exists.

create_if_not_exist

create_if_not_exist() -> None

Checks the current status and creates the instance if it does not exist.

This method verifies the status of the instance, and if its status is identified as NOT_EXIST, it proceeds to create the instance to ensure its existence. The method does not return any value.

Returns:

Type Description
None

None

dataset_exists

dataset_exists(dataset: Dataset) -> bool

Checks whether the dataset exists and is a valid directory.

This method verifies if the provided dataset is a subclass of Dataset, checks the existence of the dataset's path, and confirms that the path is a directory.

Parameters:

Name Type Description Default
dataset Dataset

The dataset object to validate.

required

Returns:

Name Type Description
bool bool

True if the dataset exists and is a directory, False otherwise.

Raises:

Type Description
TypeError

If the provided dataset is not a subclass of Dataset.

delete

delete(force: bool = False) -> None

Deletes the store directory at the current path.

This method removes the directory associated with the store object. If the force parameter is not set to True, the deletion is subject to checks based on the current status of the store. The method ensures the store is empty or does not contain elements before performing deletion, and it raises an appropriate error if the store does not exist or is not valid.

Parameters:

Name Type Description Default
force bool

If True, forces the deletion of the store directory without performing status checks or validations. Defaults to False.

False

Raises:

Type Description
IOError

If the store contains elements and force is False.

IOError

If the store does not exist and force is False.

IOError

If the path refers to a file that is not a valid store and force is False.

delete_all_empty_datasets_from_store

delete_all_empty_datasets_from_store(dry_run: bool = False) -> None

Deletes all empty datasets from the store directory. A dataset is considered empty if its directory only contains the files 'dataset.pkl' and 'metadata.json'. The method can optionally run in dry-run mode to only log the directories that would have been deleted without actually deleting them.

Parameters:

Name Type Description Default
dry_run bool

If True, prints the dataset directories that would be deleted instead of deleting them.

False

delete_from_store

delete_from_store(dataset: Dataset) -> None

Deletes a dataset from the storage.

This method removes the specified dataset directory from the file system, ensuring it is no longer available in the storage.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to be deleted.

required

derive

derive(dataset: Dataset) -> None

Derives a new dataset by following a structured workflow and validates its creation.

This method performs several steps such as checking the dataset's status and existence, creating necessary directories, saving metadata and implementation details, and invoking the dataset-specific derive implementation. In case of errors during the derivation process, it cleans up the dataset store to maintain consistency.

Parameters:

Name Type Description Default
dataset Dataset

The dataset object to be derived. It is expected to be a subclass of the Dataset base class.

required

Raises:

Type Description
AssertionError

If the dataset is not of the correct type or if its status or existence in the store cannot be verified.

Exception

If any error occurs during the dataset derivation process. The raised exception will propagate after executing necessary cleanup.

get_archive_filepaths

get_archive_filepaths() -> Optional[dict[str, Path]]

Retrieves a dictionary mapping filenames to their corresponding file paths from the archive path.

This method ensures that the system is in a valid state before proceeding. If the _archive_path attribute is None, the method returns None. Otherwise, it constructs and returns a dictionary where each key is a filename and the value is the corresponding file path from the specified archive directory.

Returns:

Type Description
Optional[dict[str, Path]]

Optional[dict[str, Path]]: A dictionary mapping filenames to their

Optional[dict[str, Path]]

corresponding Path objects from the archive path, or None if the

Optional[dict[str, Path]]

archive path is not set.

get_dataset_filepaths

get_dataset_filepaths(dataset: Dataset) -> dict[str, Path]

Returns a dictionary mapping file names to their respective file paths for a given dataset, excluding specific metadata and dataset files.

Parameters:

Name Type Description Default
dataset Dataset

The dataset object for which file paths are to be retrieved.

required

Returns:

Type Description
dict[str, Path]

dict[str, Path]: A dictionary where keys are the file names and values are the file paths

dict[str, Path]

for the respective files within the dataset directory.

get_path

get_path(dataset: Optional[Dataset] = None) -> Path

Gets the path associated with the dataset.

This method retrieves either the default path or the path corresponding to the given dataset. If no dataset is provided, the default path is returned. If a dataset is provided and it is a subclass of Dataset, the dataset's specific path is concatenated to the default path. If the provided dataset parameter is not a subclass of Dataset, a TypeError is raised.

Parameters:

Name Type Description Default
dataset Optional[Dataset]

The dataset object which defines a specific path. If not provided, defaults to None, returning the default path.

None

Returns:

Name Type Description
Path Path

The resulting path, either the default path or the combined path

Path

for the given dataset.

Raises:

Type Description
TypeError

If the provided dataset is not a subclass of Dataset.

make_dataset_dir

make_dataset_dir(dataset: Dataset) -> Path

Creates a directory for the specified dataset if it does not already exist.

The method first verifies that the dataset is not already marked as existing. Once confirmed, it determines the directory path corresponding to the dataset and creates the directory. The created directory path is then returned.

Parameters:

Name Type Description Default
dataset Dataset

An instance of the Dataset class representing the dataset for which the directory is to be created.

required

Returns:

Name Type Description
Path Path

The path to the newly created directory.

Raises:

Type Description
FileExistsError

If the directory already exists.

AssertionError

If the dataset is marked as missing.

safe_derive

safe_derive(dataset: Dataset, overwrite: bool = False) -> None

Executes a safe derivation process for the provided dataset, ensuring that existing datasets are not overwritten unless explicitly specified. This method combines checks for pre-existing datasets, handling deletion if necessary, and invokes dataset derivation logic. The overwrite parameter controls whether an existing dataset should be overwritten.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to be processed. Must be an instance or subclass of the Dataset type.

required
overwrite bool

A boolean flag indicating whether to overwrite the existing dataset in the store. Defaults to False.

False

save_dataset_implementation

save_dataset_implementation(dataset: Dataset) -> None

Saves the given dataset implementation to a specified file path.

This method serializes the provided dataset object and saves it as a pickle file using the dill module. The file is stored in a directory defined by the path generated for the dataset. It overwrites the contents if a file with the same name already exists.

Parameters:

Name Type Description Default
dataset Dataset

The dataset object to be serialized and saved.

required

save_dataset_metadata

save_dataset_metadata(dataset: Dataset) -> None

Saves metadata of the given dataset including information about the Python environment, hardware, and operating system.

This method captures Python-related metadata (version, implementation, active packages), and system information (platform, memory, current date and time) and saves it in JSON format as part of the dataset.

Parameters:

Name Type Description Default
dataset Dataset

The dataset for which the metadata will be saved.

required

status

status() -> LairStatus

Determines the current status of the lair by checking the existence and validity of its path and configuration file.

The method evaluates the state of the lair based on the presence of its path and a valid configuration file. If the path does not exist, the lair status is considered as NOT_EXIST. If the path exists but the configuration file is missing, it is marked as MALFORMED. Otherwise, the lair is deemed OK.

Returns:

Name Type Description
LairStatus LairStatus

The status of the lair, which can be one of the following: - LairStatus.NOT_EXIST: The lair path does not exist. - LairStatus.MALFORMED: The lair exists but the configuration file is missing. - LairStatus.OK: The lair is properly set up and its path and configuration are intact.

LairStatus

Bases: Enum

Represents the possible statuses of a 'Lair' directory and its configuration.

Defines an enumeration for representing various statuses that a 'Lair' directory and its associated configuration can have. This is primarily used to indicate the state or validity of a directory with respect to its structure and configuration file.

Attributes:

Name Type Description
NOT_EXIST

Indicates that the directory does not exist.

NOT_DIRECTORY

Indicates that the specified path exists but is not a directory.

CONFIG_MISSING

Indicates that the required configuration file is missing in the directory.

MALFORMED

Indicates that the configuration file exists but is improperly formatted.

OK

Indicates that the directory and its configuration file are valid and properly formatted.