Lair Class API

Lair

This class represents a data storage system called "Lair" for managing structured datasets. It provides functionality to check, create, delete, and manage datasets, as well as maintain metadata and configurations. Additionally, it supports operations to ensure data integrity and applicable directory permissions.

Detailed description of the class, its purpose, and usage.

Attributes:

Name	Type	Description
`_path`	`Path`	Absolute path to the main storage directory for the lair.
`_archive_path`	`Optional[Path]`	Absolute path to an optional archive directory for storing supplementary data.

enter

__enter__() -> Self

Manages the resource context and ensures the resource is in an acceptable state upon entering the context. This method should be used with a context manager to guarantee that the resource status is validated before it is used.

Returns:

Name	Type	Description
`Self`	`Self`	An instance of the class this method belongs to, ensuring the
	`Self`	context is correctly initialized for usage.

exit

__exit__(exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException], exc_tb: Optional[TracebackType]) -> Literal[False]

Handles the context manager exit by explicitly returning False to indicate that exceptions should not be suppressed during context management.

Parameters:

Name	Type	Description	Default
`exc_type`	`Optional[Type[BaseException]]`	The class of the exception that occurred, if any; otherwise, None.	required
`exc_val`	`Optional[BaseException]`	The exception instance that occurred, if any; otherwise, None.	required
`exc_tb`	`Optional[TracebackType]`	The traceback object associated with the exception, if any; otherwise, None.	required

Returns:

Type	Description
`Literal[False]`	Literal[False]: Always returns False, which specifies that exceptions will not
`Literal[False]`	be suppressed and will propagate normally.

init

__init__(path: Path | str = Path('./lair'), archive_path: Optional[Path | str] = None)

Initializes an instance of the class with specified paths.

The constructor sets up the main path and optional archive path, converting them to absolute and resolved Path objects. If the archive_path is not provided, it remains as None.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str`	The main path, which is used and stored as an absolute, resolved Path object. Defaults to './lair'.	`Path('./lair')`
`archive_path`	`Optional[Path \| str]`	The optional archive path, which, if provided, is stored as an absolute, resolved Path object. Defaults to None.	`None`

str

__str__() -> str

Returns a string representation of the object.

The method provides a meaningful string that represents the object. This can be used for debugging or display purposes. The returned string includes the object's specific state or data.

Returns:

Name	Type	Description
`str`	`str`	A string representation of the object.

assert_dataset_exists

assert_dataset_exists(dataset: Dataset) -> None

Asserts that a specified dataset exists in the current context.

This method verifies the existence of the provided dataset by checking internal conditions or performing necessary validations. It ensures the dataset is present and accessible, raising an assertion error if the dataset is not found.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset object to validate for existence.	required

Raises:

Type	Description
`AssertionError`	If the specified dataset does not exist.

assert_dataset_missing

assert_dataset_missing(dataset: Dataset) -> None

Asserts that the specified dataset does not exist. If the dataset exists, an assertion error is raised with an appropriate error message. This method ensures its operation by first checking the system status is valid.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset to verify for non-existence.	required

assert_ok_satus

assert_ok_satus() -> None

Asserts that the current status of the store is LairStatus.OK.

This method verifies the status of the store object to ensure it is in a valid and acceptable state (LairStatus.OK). If the status does not match, an assertion error is raised with a detailed error message containing the store information and its current status.

Raises:

Type	Description
`AssertionError`	If the store status is not LairStatus.OK. The error message

check_store_permissions

check_store_permissions() -> bool

Checks if the store permissions are valid.

This method evaluates the permissions of a file or directory at the path returned by get_path(). It ensures the permissions fall within the range of valid store permissions.

Returns:

Name	Type	Description
`bool`	`bool`	True if the permissions are valid, False otherwise.

create

create() -> None

Creates a new store directory and its configuration file.

This method is responsible for creating the necessary directory structure and associated configuration file for a store. It ensures the store does not already exist and creates the directory at the specified path. The configuration file is created within the newly created directory.

Raises:

Type	Description
`FileExistsError`	If the directory for the store already exists.

create_if_not_exist

create_if_not_exist() -> None

Checks the current status and creates the instance if it does not exist.

This method verifies the status of the instance, and if its status is identified as NOT_EXIST, it proceeds to create the instance to ensure its existence. The method does not return any value.

Returns:

Type	Description
`None`	None

dataset_exists

dataset_exists(dataset: Dataset) -> bool

Checks whether the dataset exists and is a valid directory.

This method verifies if the provided dataset is a subclass of Dataset, checks the existence of the dataset's path, and confirms that the path is a directory.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset object to validate.	required

Returns:

Name	Type	Description
`bool`	`bool`	True if the dataset exists and is a directory, False otherwise.

Raises:

Type	Description
`TypeError`	If the provided dataset is not a subclass of `Dataset`.

delete

delete(force: bool = False) -> None

Deletes the store directory at the current path.

This method removes the directory associated with the store object. If the force parameter is not set to True, the deletion is subject to checks based on the current status of the store. The method ensures the store is empty or does not contain elements before performing deletion, and it raises an appropriate error if the store does not exist or is not valid.

Parameters:

Name	Type	Description	Default
`force`	`bool`	If True, forces the deletion of the store directory without performing status checks or validations. Defaults to False.	`False`

Raises:

Type	Description
`IOError`	If the store contains elements and `force` is False.
`IOError`	If the store does not exist and `force` is False.
`IOError`	If the path refers to a file that is not a valid store and `force` is False.

delete_all_empty_datasets_from_store

delete_all_empty_datasets_from_store(dry_run: bool = False) -> None

Deletes all empty datasets from the store directory. A dataset is considered empty if its directory only contains the files 'dataset.pkl' and 'metadata.json'. The method can optionally run in dry-run mode to only log the directories that would have been deleted without actually deleting them.

Parameters:

Name	Type	Description	Default
`dry_run`	`bool`	If True, prints the dataset directories that would be deleted instead of deleting them.	`False`

delete_from_store

delete_from_store(dataset: Dataset) -> None

Deletes a dataset from the storage.

This method removes the specified dataset directory from the file system, ensuring it is no longer available in the storage.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset to be deleted.	required

derive

derive(dataset: Dataset) -> None

Derives a new dataset by following a structured workflow and validates its creation.

This method performs several steps such as checking the dataset's status and existence, creating necessary directories, saving metadata and implementation details, and invoking the dataset-specific derive implementation. In case of errors during the derivation process, it cleans up the dataset store to maintain consistency.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset object to be derived. It is expected to be a subclass of the Dataset base class.	required

Raises:

Type	Description
`AssertionError`	If the dataset is not of the correct type or if its status or existence in the store cannot be verified.
`Exception`	If any error occurs during the dataset derivation process. The raised exception will propagate after executing necessary cleanup.

get_archive_filepaths

get_archive_filepaths() -> Optional[dict[str, Path]]

Retrieves a dictionary mapping filenames to their corresponding file paths from the archive path.

This method ensures that the system is in a valid state before proceeding. If the _archive_path attribute is None, the method returns None. Otherwise, it constructs and returns a dictionary where each key is a filename and the value is the corresponding file path from the specified archive directory.

Returns:

Type	Description
`Optional[dict[str, Path]]`	Optional[dict[str, Path]]: A dictionary mapping filenames to their
`Optional[dict[str, Path]]`	corresponding `Path` objects from the archive path, or `None` if the
`Optional[dict[str, Path]]`	archive path is not set.

get_dataset_filepaths

get_dataset_filepaths(dataset: Dataset) -> dict[str, Path]

Returns a dictionary mapping file names to their respective file paths for a given dataset, excluding specific metadata and dataset files.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset object for which file paths are to be retrieved.	required

Returns:

Type	Description
`dict[str, Path]`	dict[str, Path]: A dictionary where keys are the file names and values are the file paths
`dict[str, Path]`	for the respective files within the dataset directory.

get_path

get_path(dataset: Optional[Dataset] = None) -> Path

Gets the path associated with the dataset.

This method retrieves either the default path or the path corresponding to the given dataset. If no dataset is provided, the default path is returned. If a dataset is provided and it is a subclass of Dataset, the dataset's specific path is concatenated to the default path. If the provided dataset parameter is not a subclass of Dataset, a TypeError is raised.

Parameters:

Name	Type	Description	Default
`dataset`	`Optional[Dataset]`	The dataset object which defines a specific path. If not provided, defaults to `None`, returning the default path.	`None`

Returns:

Name	Type	Description
`Path`	`Path`	The resulting path, either the default path or the combined path
	`Path`	for the given dataset.

Raises:

Type	Description
`TypeError`	If the provided dataset is not a subclass of `Dataset`.

make_dataset_dir

make_dataset_dir(dataset: Dataset) -> Path

Creates a directory for the specified dataset if it does not already exist.

The method first verifies that the dataset is not already marked as existing. Once confirmed, it determines the directory path corresponding to the dataset and creates the directory. The created directory path is then returned.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	An instance of the Dataset class representing the dataset for which the directory is to be created.	required

Returns:

Name	Type	Description
`Path`	`Path`	The path to the newly created directory.

Raises:

Type	Description
`FileExistsError`	If the directory already exists.
`AssertionError`	If the dataset is marked as missing.

safe_derive

safe_derive(dataset: Dataset, overwrite: bool = False) -> None

Executes a safe derivation process for the provided dataset, ensuring that existing datasets are not overwritten unless explicitly specified. This method combines checks for pre-existing datasets, handling deletion if necessary, and invokes dataset derivation logic. The overwrite parameter controls whether an existing dataset should be overwritten.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset to be processed. Must be an instance or subclass of the Dataset type.	required
`overwrite`	`bool`	A boolean flag indicating whether to overwrite the existing dataset in the store. Defaults to False.	`False`

save_dataset_implementation

save_dataset_implementation(dataset: Dataset) -> None

Saves the given dataset implementation to a specified file path.

This method serializes the provided dataset object and saves it as a pickle file using the dill module. The file is stored in a directory defined by the path generated for the dataset. It overwrites the contents if a file with the same name already exists.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset object to be serialized and saved.	required

save_dataset_metadata

save_dataset_metadata(dataset: Dataset) -> None

Saves metadata of the given dataset including information about the Python environment, hardware, and operating system.

This method captures Python-related metadata (version, implementation, active packages), and system information (platform, memory, current date and time) and saves it in JSON format as part of the dataset.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset for which the metadata will be saved.	required

status

status() -> LairStatus

Determines the current status of the lair by checking the existence and validity of its path and configuration file.

The method evaluates the state of the lair based on the presence of its path and a valid configuration file. If the path does not exist, the lair status is considered as NOT_EXIST. If the path exists but the configuration file is missing, it is marked as MALFORMED. Otherwise, the lair is deemed OK.

Returns:

Name	Type	Description
`LairStatus`	`LairStatus`	The status of the lair, which can be one of the following: - LairStatus.NOT_EXIST: The lair path does not exist. - LairStatus.MALFORMED: The lair exists but the configuration file is missing. - LairStatus.OK: The lair is properly set up and its path and configuration are intact.

LairStatus

Bases: Enum

Represents the possible statuses of a 'Lair' directory and its configuration.

Defines an enumeration for representing various statuses that a 'Lair' directory and its associated configuration can have. This is primarily used to indicate the state or validity of a directory with respect to its structure and configuration file.

Attributes:

Name	Type	Description
`NOT_EXIST`		Indicates that the directory does not exist.
`NOT_DIRECTORY`		Indicates that the specified path exists but is not a directory.
`CONFIG_MISSING`		Indicates that the required configuration file is missing in the directory.
`MALFORMED`		Indicates that the configuration file exists but is improperly formatted.
`OK`		Indicates that the directory and its configuration file are valid and properly formatted.