Skip to content

Download Class API

This module provides functionality to download files from various sources, such as public databases like GEO, ArrayExpress, or any generic URL. It supports FTP and HTTP(S) protocols and includes progress tracking during downloads.

Functions:

Name Description
download_file

str, filepath: Path) -> None: Downloads a file from the specified URL and saves it to the given file path.

download_supplementary_from_geo

str, local_dir: Path) -> None: Downloads supplementary files for a given GSE ID from the GEO database via FTP.

download_files_from_arrayexpress

str, local_dir: Path) -> None: Downloads files from an ArrayExpress dataset via FTP.

Dependencies
  • ftplib: For handling FTP-based downloads.
  • pathlib: For handling file paths in a platform-independent manner.
  • requests: For handling HTTP(S) file downloads.
  • tqdm: For visually tracking download progress.

download_file

download_file(url: str, filepath: Path) -> None

Downloads a file from the given URL and saves it to the specified file path.

This function sends a GET request to the provided URL and streams the content in chunks to avoid overwhelming the memory. It displays the download progress using the tqdm library and writes the data to the specified file path.

Parameters:

Name Type Description Default
url str

The URL of the file to be downloaded.

required
filepath Path

The path of the file where the downloaded content will be saved.

required

download_files_from_arrayexpress

download_files_from_arrayexpress(arrayexpress_id: str, local_dir: Path) -> None

Downloads files from an ArrayExpress dataset to a local directory.

This function connects to the ArrayExpress FTP server, navigates to the specific dataset directory using the given arrayexpress_id, and downloads all the files into the specified local directory.

Parameters:

Name Type Description Default
arrayexpress_id str

The unique identifier of the ArrayExpress dataset.

required
local_dir Path

The local directory where the files will be downloaded.

required

download_supplementary_from_geo

download_supplementary_from_geo(gse_id: str, local_dir: Path) -> None

Downloads supplementary files from the GEO (Gene Expression Omnibus) database for a given GSE ID and saves them to a specified local directory.

The function connects to the National Center for Biotechnology Information (NCBI) FTP server, navigates to the directory containing supplementary files for the specified GSE ID, and downloads each file to the local directory. If the directory does not exist, it will be created.

Parameters:

Name Type Description Default
gse_id str

Gene Series Expression (GSE) ID used to locate the supplementary files in the GEO database.

required
local_dir Path

Local directory where the downloaded files will be stored.

required