Download Class API
This module provides functionality to download files from various sources, such as public databases like GEO, ArrayExpress, or any generic URL. It supports FTP and HTTP(S) protocols and includes progress tracking during downloads.
Functions:
| Name | Description |
|---|---|
download_file |
str, filepath: Path) -> None: Downloads a file from the specified URL and saves it to the given file path. |
download_supplementary_from_geo |
str, local_dir: Path) -> None: Downloads supplementary files for a given GSE ID from the GEO database via FTP. |
download_files_from_arrayexpress |
str, local_dir: Path) -> None: Downloads files from an ArrayExpress dataset via FTP. |
Dependencies
- ftplib: For handling FTP-based downloads.
- pathlib: For handling file paths in a platform-independent manner.
- requests: For handling HTTP(S) file downloads.
- tqdm: For visually tracking download progress.
download_file
download_file(url: str, filepath: Path) -> None
Downloads a file from the given URL and saves it to the specified file path.
This function sends a GET request to the provided URL and streams the content in chunks to avoid overwhelming the memory. It displays the download progress using the tqdm library and writes the data to the specified file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The URL of the file to be downloaded. |
required |
filepath
|
Path
|
The path of the file where the downloaded content will be saved. |
required |
download_files_from_arrayexpress
download_files_from_arrayexpress(arrayexpress_id: str, local_dir: Path) -> None
Downloads files from an ArrayExpress dataset to a local directory.
This function connects to the ArrayExpress FTP server, navigates to the
specific dataset directory using the given arrayexpress_id, and downloads
all the files into the specified local directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arrayexpress_id
|
str
|
The unique identifier of the ArrayExpress dataset. |
required |
local_dir
|
Path
|
The local directory where the files will be downloaded. |
required |
download_supplementary_from_geo
download_supplementary_from_geo(gse_id: str, local_dir: Path) -> None
Downloads supplementary files from the GEO (Gene Expression Omnibus) database for a given GSE ID and saves them to a specified local directory.
The function connects to the National Center for Biotechnology Information (NCBI) FTP server, navigates to the directory containing supplementary files for the specified GSE ID, and downloads each file to the local directory. If the directory does not exist, it will be created.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gse_id
|
str
|
Gene Series Expression (GSE) ID used to locate the supplementary files in the GEO database. |
required |
local_dir
|
Path
|
Local directory where the downloaded files will be stored. |
required |