Lair Class API
Lair
This class represents a data storage system called "Lair" for managing structured datasets. It provides functionality to check, create, delete, and manage datasets, as well as maintain metadata and configurations. Additionally, it supports operations to ensure data integrity and applicable directory permissions.
Detailed description of the class, its purpose, and usage.
Attributes:
| Name | Type | Description |
|---|---|---|
_path |
Path
|
Absolute path to the main storage directory for the lair. |
_archive_path |
Optional[Path]
|
Absolute path to an optional archive directory for storing supplementary data. |
__enter__
__enter__() -> Self
Manages the resource context and ensures the resource is in an acceptable state upon entering the context. This method should be used with a context manager to guarantee that the resource status is validated before it is used.
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
An instance of the class this method belongs to, ensuring the |
Self
|
context is correctly initialized for usage. |
__exit__
__exit__(exc_type: Optional[Type[BaseException]], exc_val: Optional[BaseException], exc_tb: Optional[TracebackType]) -> Literal[False]
Handles the context manager exit by explicitly returning False to indicate that exceptions should not be suppressed during context management.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exc_type
|
Optional[Type[BaseException]]
|
The class of the exception that occurred, if any; otherwise, None. |
required |
exc_val
|
Optional[BaseException]
|
The exception instance that occurred, if any; otherwise, None. |
required |
exc_tb
|
Optional[TracebackType]
|
The traceback object associated with the exception, if any; otherwise, None. |
required |
Returns:
| Type | Description |
|---|---|
Literal[False]
|
Literal[False]: Always returns False, which specifies that exceptions will not |
Literal[False]
|
be suppressed and will propagate normally. |
__init__
__init__(path: Path | str = Path('./lair'), archive_path: Optional[Path | str] = None)
Initializes an instance of the class with specified paths.
The constructor sets up the main path and optional archive path, converting them to absolute and resolved Path objects. If the archive_path is not provided, it remains as None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
The main path, which is used and stored as an absolute, resolved Path object. Defaults to './lair'. |
Path('./lair')
|
archive_path
|
Optional[Path | str]
|
The optional archive path, which, if provided, is stored as an absolute, resolved Path object. Defaults to None. |
None
|
__str__
__str__() -> str
Returns a string representation of the object.
The method provides a meaningful string that represents the object. This can be used for debugging or display purposes. The returned string includes the object's specific state or data.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A string representation of the object. |
assert_dataset_exists
assert_dataset_exists(dataset: Dataset) -> None
Asserts that a specified dataset exists in the current context.
This method verifies the existence of the provided dataset by checking internal conditions or performing necessary validations. It ensures the dataset is present and accessible, raising an assertion error if the dataset is not found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset object to validate for existence. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the specified dataset does not exist. |
assert_dataset_missing
assert_dataset_missing(dataset: Dataset) -> None
Asserts that the specified dataset does not exist. If the dataset exists, an assertion error is raised with an appropriate error message. This method ensures its operation by first checking the system status is valid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset to verify for non-existence. |
required |
assert_ok_satus
assert_ok_satus() -> None
Asserts that the current status of the store is LairStatus.OK.
This method verifies the status of the store object to ensure it is in a valid and acceptable state (LairStatus.OK). If the status does not match, an assertion error is raised with a detailed error message containing the store information and its current status.
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the store status is not LairStatus.OK. The error message |
check_store_permissions
check_store_permissions() -> bool
Checks if the store permissions are valid.
This method evaluates the permissions of a file or directory at the path
returned by get_path(). It ensures the permissions fall within the
range of valid store permissions.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the permissions are valid, False otherwise. |
create
create() -> None
Creates a new store directory and its configuration file.
This method is responsible for creating the necessary directory structure and associated configuration file for a store. It ensures the store does not already exist and creates the directory at the specified path. The configuration file is created within the newly created directory.
Raises:
| Type | Description |
|---|---|
FileExistsError
|
If the directory for the store already exists. |
create_if_not_exist
create_if_not_exist() -> None
Checks the current status and creates the instance if it does not exist.
This method verifies the status of the instance, and if its status is
identified as NOT_EXIST, it proceeds to create the instance to ensure its
existence. The method does not return any value.
Returns:
| Type | Description |
|---|---|
None
|
None |
dataset_exists
dataset_exists(dataset: Dataset) -> bool
Checks whether the dataset exists and is a valid directory.
This method verifies if the provided dataset is a subclass of Dataset, checks the
existence of the dataset's path, and confirms that the path is a directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset object to validate. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the dataset exists and is a directory, False otherwise. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the provided dataset is not a subclass of |
delete
delete(force: bool = False) -> None
Deletes the store directory at the current path.
This method removes the directory associated with the store object. If the
force parameter is not set to True, the deletion is subject to checks based
on the current status of the store. The method ensures the store is empty or
does not contain elements before performing deletion, and it raises an
appropriate error if the store does not exist or is not valid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
force
|
bool
|
If True, forces the deletion of the store directory without performing status checks or validations. Defaults to False. |
False
|
Raises:
| Type | Description |
|---|---|
IOError
|
If the store contains elements and |
IOError
|
If the store does not exist and |
IOError
|
If the path refers to a file that is not a valid store and
|
delete_all_empty_datasets_from_store
delete_all_empty_datasets_from_store(dry_run: bool = False) -> None
Deletes all empty datasets from the store directory. A dataset is considered empty if its directory only contains the files 'dataset.pkl' and 'metadata.json'. The method can optionally run in dry-run mode to only log the directories that would have been deleted without actually deleting them.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dry_run
|
bool
|
If True, prints the dataset directories that would be deleted instead of deleting them. |
False
|
delete_from_store
delete_from_store(dataset: Dataset) -> None
Deletes a dataset from the storage.
This method removes the specified dataset directory from the file system, ensuring it is no longer available in the storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset to be deleted. |
required |
derive
derive(dataset: Dataset) -> None
Derives a new dataset by following a structured workflow and validates its creation.
This method performs several steps such as checking the dataset's status and existence,
creating necessary directories, saving metadata and implementation details, and invoking
the dataset-specific derive implementation. In case of errors during the derivation
process, it cleans up the dataset store to maintain consistency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset object to be derived. It is expected to be a subclass of the Dataset base class. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the dataset is not of the correct type or if its status or existence in the store cannot be verified. |
Exception
|
If any error occurs during the dataset derivation process. The raised exception will propagate after executing necessary cleanup. |
get_archive_filepaths
get_archive_filepaths() -> Optional[dict[str, Path]]
Retrieves a dictionary mapping filenames to their corresponding file paths from the archive path.
This method ensures that the system is in a valid state before proceeding. If
the _archive_path attribute is None, the method returns None. Otherwise,
it constructs and returns a dictionary where each key is a filename and the
value is the corresponding file path from the specified archive directory.
Returns:
| Type | Description |
|---|---|
Optional[dict[str, Path]]
|
Optional[dict[str, Path]]: A dictionary mapping filenames to their |
Optional[dict[str, Path]]
|
corresponding |
Optional[dict[str, Path]]
|
archive path is not set. |
get_dataset_filepaths
get_dataset_filepaths(dataset: Dataset) -> dict[str, Path]
Returns a dictionary mapping file names to their respective file paths for a given dataset, excluding specific metadata and dataset files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset object for which file paths are to be retrieved. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Path]
|
dict[str, Path]: A dictionary where keys are the file names and values are the file paths |
dict[str, Path]
|
for the respective files within the dataset directory. |
get_path
get_path(dataset: Optional[Dataset] = None) -> Path
Gets the path associated with the dataset.
This method retrieves either the default path or the path corresponding to
the given dataset. If no dataset is provided, the default path is returned.
If a dataset is provided and it is a subclass of Dataset, the dataset's
specific path is concatenated to the default path. If the provided dataset
parameter is not a subclass of Dataset, a TypeError is raised.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Optional[Dataset]
|
The dataset object which defines a specific
path. If not provided, defaults to |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
The resulting path, either the default path or the combined path |
Path
|
for the given dataset. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the provided dataset is not a subclass of |
make_dataset_dir
make_dataset_dir(dataset: Dataset) -> Path
Creates a directory for the specified dataset if it does not already exist.
The method first verifies that the dataset is not already marked as existing. Once confirmed, it determines the directory path corresponding to the dataset and creates the directory. The created directory path is then returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
An instance of the Dataset class representing the dataset for which the directory is to be created. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
The path to the newly created directory. |
Raises:
| Type | Description |
|---|---|
FileExistsError
|
If the directory already exists. |
AssertionError
|
If the dataset is marked as missing. |
safe_derive
safe_derive(dataset: Dataset, overwrite: bool = False) -> None
Executes a safe derivation process for the provided dataset, ensuring that existing datasets are not overwritten unless explicitly specified. This method combines checks for pre-existing datasets, handling deletion if necessary, and invokes dataset derivation logic. The overwrite parameter controls whether an existing dataset should be overwritten.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset to be processed. Must be an instance or subclass of the Dataset type. |
required |
overwrite
|
bool
|
A boolean flag indicating whether to overwrite the existing dataset in the store. Defaults to False. |
False
|
save_dataset_implementation
save_dataset_implementation(dataset: Dataset) -> None
Saves the given dataset implementation to a specified file path.
This method serializes the provided dataset object and saves it as a pickle file using the dill module. The file is stored in a directory defined by the path generated for the dataset. It overwrites the contents if a file with the same name already exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset object to be serialized and saved. |
required |
save_dataset_metadata
save_dataset_metadata(dataset: Dataset) -> None
Saves metadata of the given dataset including information about the Python environment, hardware, and operating system.
This method captures Python-related metadata (version, implementation, active packages), and system information (platform, memory, current date and time) and saves it in JSON format as part of the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset for which the metadata will be saved. |
required |
status
status() -> LairStatus
Determines the current status of the lair by checking the existence and validity of its path and configuration file.
The method evaluates the state of the lair based on the presence of its path and a valid configuration file. If the path does not exist, the lair status is considered as NOT_EXIST. If the path exists but the configuration file is missing, it is marked as MALFORMED. Otherwise, the lair is deemed OK.
Returns:
| Name | Type | Description |
|---|---|---|
LairStatus |
LairStatus
|
The status of the lair, which can be one of the following: - LairStatus.NOT_EXIST: The lair path does not exist. - LairStatus.MALFORMED: The lair exists but the configuration file is missing. - LairStatus.OK: The lair is properly set up and its path and configuration are intact. |
LairStatus
Bases: Enum
Represents the possible statuses of a 'Lair' directory and its configuration.
Defines an enumeration for representing various statuses that a 'Lair' directory and its associated configuration can have. This is primarily used to indicate the state or validity of a directory with respect to its structure and configuration file.
Attributes:
| Name | Type | Description |
|---|---|---|
NOT_EXIST |
Indicates that the directory does not exist. |
|
NOT_DIRECTORY |
Indicates that the specified path exists but is not a directory. |
|
CONFIG_MISSING |
Indicates that the required configuration file is missing in the directory. |
|
MALFORMED |
Indicates that the configuration file exists but is improperly formatted. |
|
OK |
Indicates that the directory and its configuration file are valid and properly formatted. |