dataset module

This module provides a convenient way to programmatically work with Zenodo, facilitating the publication and management of research data on this platform. Researchers can create datasets, associate local files, upload data to Zenodo, and keep the dataset metadata up-to-date, all using Python code. This can streamline the process of sharing research data and integrating it with Zenodo’s services.

Examples

Make sure to run the examples in Zenodo’s sandbox:

# Import Dataset and Zenodo classes
from zen import LocalFiles, Zenodo

# Add local files to the dataset
local_file_paths = ['examples/file1.csv', 'examples/file2.csv']
ds = LocalFiles(local_file_paths, dataset_path='examples/dataset.json')

# Setup a Zenodo deposition for the dataset. If there is no deposition
# associated with dataset, creates a new one. You can also pass an existing
# deposition to associate with your local files dataset.
# Replace `token` by your actual access token.
zen = Zenodo(url=Zenodo.sandbox_url, token='your_access_token')
ds.set_deposition(api=zen, create_if_not_exists=True)
ds.save()

# The LocalFiles object is a container for local files
len(ds)  # Number of files in the dataset
ds.storage_size
ds.filenames  # Show the list of file names
ds[0]  # Access the first local file in the dataset

# Accessing the deposition associated with your local files
dep = ds.deposition
dep.data  # Dictionary representing Zenodo Deposition raw data
dep.id  # Deposition ID
dep.files  # DepositionFiles class to interact with remote files
dep.metadata  # Metadata class to interact with deposition metadata

# Upload the local files not already uploaded to Zenodo
ds.upload()

# To open the dataset in a later Python session, use `from_file()` method
ds = LocalFiles.from_file('examples/dataset.json')

# If some local files has been changed, `upload()` method will detect it.
# It uploads just those local files not updated in Zenodo.
# Calling `set_deposition()` method again just retrieves the existing deposition
# already linked to the current dataset from Zenodo.
ds.set_deposition(api=zen)
ds.upload()

# Adding additional files to the dataset
new_local_file_paths = ['./examples/file3.csv']
ds.add(new_local_file_paths)

# Save to persist changes in local dataset
ds.save()

# Interacting with files in Zenodo
dep.files  # Access API endpoints to interact with files
dep.files.create('./examples/file3.csv')  # Upload a new file
# The above code does not produce any side effect in ds dataset
dep.refresh()  # Refresh deposition object to reflect Zenodo files

# Lists all modified local files or not uploaded to Zenodo deposition
[f for f in ds if f not in dep]

# Lists all files in Zenodo deposition not present in the local dataset
[f for f in dep if f not in ds]

# Upload only new/modified files to Zenodo deposition
ds.upload()

# Delete a file from Zenodo deposition
# This does not affect the local files dataset
dep.files[2].delete()
dep.refresh()

# Update the deposition metadata
dep.metadata.upload_type = 'dataset'
dep.metadata.title = 'My title'
dep.metadata.description = 'My description'
dep.metadata.creators.clear()
dep.metadata.creators.add('Doe, John', 'Zenodo')
dep.metadata.license = 'cc-by-4.0'
dep.update()

# Interacting with Zenodo deposition
# Discard any edition in the deposition from Zenodo
# If the deposition is not published, delete it from Zenodo
dep.discard()
# Publish the deposition
# After publication, you cannot add, remove, or update any files
dep.publish()
# Create a new version of the deposition to add more files
dep.new_version()

Note

  • Before using this submodule, make sure you have a valid Zenodo account and access token.

  • Always refer to the Zenodo API documentation for detailed information about available endpoints and parameters.

  • For more information, visit: https://developers.zenodo.org/

class zen.dataset.BaseFile(file: Dict[str, Any] | str)

Bases: dict

Base Class for Representing Files with Associated Metadata.

This class represents a file, whether it’s local or remote, and includes associated metadata. It serves as the foundation for the LocalFile and ZenodoFile classes.

The BaseFile class is designed to encapsulate file-related information and metadata. It can be instantiated with either a dictionary representing file information or a file path as a string. When given a file path, it automatically extracts the filename and creates download links. For dictionary input, the required ‘filename’ and ‘links’ entries must be present.

Subclasses of BaseFile provide specific functionality for handling local and remote files.

Parameters:

file (Union[Dict[str,Any],str]) – The dictionary representing a file or a file path.

property checksum: str | None

The checksum value of the file, if available.

property filedate: str | None

The last modification date of the file.

property filename: str

The base name of the file.

property filesize: int | None

The size of the file in bytes.

property is_local: bool

Indicates whether the file is stored locally (on the user’s machine).

property is_remote: bool

Indicates whether the file is stored remotely (accessible via a URL).

The supported URL schemas are defined in utils.valid_schemas variable.

Links to various aspects of the file, such as download links.

property url: str

The URL to access the file data.

class zen.dataset.Deposition(api: Zenodo, data: Dict[str, Any])

Bases: _BaseDataset

Represents a Zenodo Deposition.

This class defines a Zenodo deposition, providing methods to create, retrieve, and interact with specific Zenodo depositions using the Zenodo API.

Parameters:
  • api (Zenodo) – The Zenodo instance used to interact with Zenodo API.

  • data (Dict[str,Any]) – The deposition data, including ‘id’, ‘metadata’, ‘files’, and ‘links’ entries.

Examples

  1. Create a new Zenodo deposition:

>>> from zen import Zenodo
>>> zen = Zenodo(url=Zenodo.sandbox_url, token='your_api_token')
>>> meta = {
...     'title': 'My New Deposition',
...     'description': 'A test deposition for demonstration purposes.'
... }
>>> dep = zen.depositions.create(metadata=meta)
>>> print(dep.id)  # print the deposition id
  1. Retrieve an existing Zenodo deposition by id:

>>> deposition_id = dep.id
>>> existing_dep = zen.depositions.retrieve(deposition_id)
  1. Modifying deposition metadata

>>> dep.metadata.title = 'New Deposition Title'
>>> dep.metadata.access_right.set_open('cc-by')
>>> dep.update()  # Commit changes

Discard the deposition example.

>>> dep.discard()
property api: Zenodo

The Zenodo object to interact with Zenodo API.

property concept_id: str

The concept ID of the deposition, or None if not available.

property data: Dict[str, Any]

The data of the deposition as represented by Zenodo.

delete() None

Delete the deposition from Zenodo.

discard() Self

Discard any edition of the deposition on Zenodo.

Returns:

The current Deposition object.

Return type:

Deposition

property doi: str

The DOI of the deposition, or None if not available.

edit() Self

Set the deposition to the ‘edit’ state on Zenodo.

Returns:

The current Deposition object.

Return type:

Deposition

property files: DepositionFiles

The files of the deposition.

property id: int

The ID of the deposition.

property is_editing: bool

Determines whether the deposition is currently in an editing state.

property is_published: bool

Determines whether the deposition was already published.

property metadata: Metadata

The metadata of the deposition.

new_version() Self

Create a new version of the deposition on Zenodo.

Returns:

The new version of the current Deposition object.

Return type:

Deposition

publish() Self

Publish the deposition on Zenodo.

Returns:

The current Deposition object.

Return type:

Deposition

refresh() Self

Refresh the Deposition Data from Zenodo.

This method sends a request to the Zenodo API to fetch the most up-to-date details about the deposition, including metadata, files, and links. It then updates the current Deposition object with the refreshed data.

Returns:

The current refreshed Deposition object.

Return type:

Deposition

property title: str

The title of the deposition.

update(metadata: Metadata | Dict[str, Any] | None = None, replacements: Dict[str, Any] | None = None) Self

Update the metadata of the deposition on Zenodo.

This method allows you to update the metadata of the deposition on the Zenodo platform. You can provide new metadata as a dictionary. If no metadata is provided, the existing metadata of the deposition will be used. The replacement argument is used to render the metadata before updating. For more details, see Metadata.render() method.

Parameters:
  • metadata (Optional[Union[Metadata,Dict[str,Any]]]=None) – The new metadata to update the deposition.

  • replacements (Optional[Dict[str,Any]]=None) – A dictionary of placeholder replacements. If not provided, an empty dictionary is used.

Returns:

The current Deposition object with updated metadata.

Return type:

Deposition

class zen.dataset.DepositionFiles(deposition: Deposition, files: List[Dict, str] | None = None)

Bases: _FileDataset

Represents the files associated with a Zenodo deposition.

This class provides methods for managing files, such as listing and refreshing files, creating and deleting files, and querying specific files within the deposition.

Parameters:
  • deposition (Deposition) – The Deposition object to which these files are associated.

  • files (Optional[List[Dict,str]]=None) – Optional initial list of file data or file paths (default is None).

create(file: str, bucket_filename: str | None = None, progress: bool = True) Self

Uploads a file to the deposition on the Zenodo API.

Parameters:
  • file (str) – The local file path of the file to upload.

  • bucket_filename (Optional[str]=None) – The desired filename for the file in the deposition’s bucket.

  • progress (bool=True) – Whether to display a progress bar. Defaults to True.

Returns:

The current object.

Return type:

DepositionFiles

property data: List[ZenodoFile]

List of files stored by the dataset.

delete(file: ZenodoFile) None

Deletes a file of the Zenodo deposition.

Parameters:

file (ZenodoFile) – The file to be deleted from Zenodo.

property index: Dict[str, ZenodoFile]

List of files stored by the dataset.

list() Self

Lists and refresh the file list of the Zenodo deposition.

Parameters:

None

Returns:

The list of files of the Zenodo deposition.

Return type:

DepositionFiles

property storage_size: int

Calculate the total data size of the files.

class zen.dataset.LocalFile(file: Dict[str, Any] | str)

Bases: BaseFile

Represents a local file within a Dataset.

This class represents a local file that is part of a dataset. It extends the functionality of the BaseFile class to provide methods for updating file metadata, uploading the file to a Zenodo deposition, and managing local files.

The LocalFile class is specifically designed to represent and manage local files within a dataset. It inherits the core functionality from the BaseFile class and can be instantiated with either a dictionary representing file information or a file path as a string.

Parameters:

file (Union[Dict[str,Any], str]) – The file information or file path. If the last component of the provided path (i.e. the basename of the path) identifies the file uniquely, the path can be passed as a path string, otherwise the file have to be passed as a dictionary.

Example

>>> from zen.dataset import LocalFile
>>> local_file = LocalFile('examples/file1.csv')
>>> local_file.update_metadata()  # Get the file date and size
>>> print(local_file.filename)
file1.csv
property checksum: str | None

The checksum value of the file, if available.

property filedate: str | None

The last modification date of the file.

property filesize: int | None

The size of the file in bytes.

parse_template(template: str) dict

Parses and extract properties from file name.

Parses the filename using a string template. Any template substring passed outside curly braces have to match exactly that portion on the filename. The template strings inside curly braces, i.e., string placeholders, have to match at least one character in the filename. A template matches a filename when all characters of the filename matches the template substrings and placeholders at the same exact order.

Parameters:

template (str) – The template used to parse the filename and extract file properties.

Returns:

A dictionary containing the parsed properties.

Return type:

dict

property placeholders
property properties: Dict[str, Any] | None

Additional properties or metadata associated with the file.

This is mainly used to store values used in file template expansion. For more details, please, see LocalFiles.expand() method.

update(other: LocalFile) None

Update current file metadata with metadata from other file.

Parameters:

other (LocalFile) – The file from which values will be taken to merge.

Raises:

ValueError – if merging files have not the same filename

update_metadata() None

Update the current file metadata.

Retrieves and updates the file’s metadata, such as size and modification date. It automatically detects whether the file is stored locally or accessible remotely via URL. If file size or file last modification date change, the checksum is erased and a new one will be recomputed during upload.

Raises:

ValueError – if the file is not accessible.

upload(deposition: Deposition, progress: bool = True, force: bool = False, max_retries: int = 15, min_delay: int = 10, max_delay: int = 60) Self

Uploads the local file to a Zenodo deposition.

Uploads the local file to a Zenodo deposition, updating its metadata in the process. If the file is remote, it will be temporarily downloaded for uploading. The checksum is calculated before upload, and the file’s metadata is updated accordingly. If parameter force is False, this method will not upload if the file is already present in the deposition with the same checksum.

Parameters:
  • deposition (Deposition) – The deposition to where upload the file.

  • progress (bool=True) – Whether to display a progress bar. Defaults to True.

  • force (bool=False) – Should the file be uploaded regardless it already been uploaded?

  • max_retries (int, optional) – Maximum number of retry attempts if the upload fails. Defaults to 15.

  • min_delay (int, optional) – Minimum delay in seconds between retry attempts. Defaults to 10 seconds.

  • max_delay (int, optional) – Maximum delay in seconds between retry attempts. Defaults to 60 seconds.

Returns:

The uploaded file with its updated metadata.

Return type:

LocalFile

property url: str

The URL to access the file data.

class zen.dataset.LocalFiles(files: List[Dict[str, Any] | str] | None = None, template: str | None = None, dataset_path: str | None = None)

Bases: _FileDataset

Represents a dataset associating local files to a Zenodo deposition.

This class provides methods for creating, updating, and managing local and Zenodo datasets. As your local dataset changes, this class simplifies the process of updating and maintaining it on Zenodo. You can quickly push updates without manually managing the complexities of Zenodo’s API.

The LocalFiles class provides access to the local files associated with the dataset. It can be initialized with a list of local files or a template file name that can be expanded into a set of file paths using string placeholders. During template expansion, properties are extracted from the filenames and stored in the respective file in the ‘properties’ key. Call save() method to make sure that any local update is saved in the dataset JSON file.

The deposition property represents the Zenodo deposition associated with the dataset. It provides access to the Zenodo deposition object for the dataset, which is important for managing the dataset on the Zenodo platform. After creating a new dataset, you can bind it to a specific Zenodo deposition using set_deposition() method.

Parameters:
  • files (Optional[List[Union[Dict[str,Any],str]]]=None) – The list of local files or file paths.

  • template (Optional[str]=None) – The filename template to extract properties from the list of file names.

  • dataset_path (Optional[str]=None) – The filename of the dataset.

Example

  1. Create a local dataset:

>>> from zen import Zenodo, LocalFiles
>>> local_file_paths = ['examples/file1.csv', 'examples/file2.csv']
>>> ds = LocalFiles(local_file_paths, dataset_path='examples/dataset.json')

2. Create or retrieve a deposition >>> zen = Zenodo(url=Zenodo.sandbox_url, token=’your_api_token’) >>> ds.set_deposition(api=zen, create_if_not_exists=True) >>> dep = ds.deposition

At the first run, this will create the deposition. After that it just load saved deposition from local dataset file.

  1. Save to a file:

>>> ds.save()
  1. Upload files to Zenodo deposition

>>> ds.upload()  # upload files
  1. Add and upload additional files

>>> ds.add(['examples/file3.csv'])
>>> ds.upload()  # upload just new/modified files
>>> ds.upload(force=True)  # upload everything again

Discard the deposition example.

>>> dep.discard()
add(files: List[Dict[str, Any] | str], template: str | None = None) Self

Add or Update Files in the Current Dataset.

This method allows you to add new files to the current dataset. If a file with the same name already exists, it will be updated in the dataset. You can also provide a filename template to extract properties from the list of filenames.

The add() method is used to add new files to the current dataset. If any of the provided files have the same name as an existing file in the dataset, the existing file is updated with the new data.

You can also specify a template to extract properties from the filenames in the list. This template is used to associate properties with the respective files in the dataset.

Parameters:
  • files (List[Union[Dict[str,Any],str]]) – The list of files to be added.

  • template (Optional[str]=None) – The filename template to extract properties from the list of filenames.

Returns:

The dataset with the added or updated files.

Return type:

LocalFiles

Example

>>> ds = LocalFiles(['examples/file1.csv', 'examples/file3.csv'])
>>> new_file_list = ['examples/file1.csv', 'examples/file2.csv']
>>> ds.add(new_file_list, template='file{index}.csv')
>>> print(len(ds))
3
property dataset_path: str | None

File path of the current dataset.

property deposition: Deposition

Deposition associated with the dataset.

expand(**kwargs) Self

Expand the Placeholders in the Dataset.

This method expands the dataset by replacing placeholders in the filenames (templates) with provided keyword arguments. Placeholders are special tokens enclosed in curly braces within the template string, e.g., ‘{placeholder}’. They represent values that can vary, and by replacing them with specific values, you can generate a list of files in the dataset.

The expand() method is used to replace placeholders in the dataset with provided values, effectively creating multiple versions of the dataset. Placeholders are identified by their names and should match the keyword arguments provided. If more than one keyword arguments is passed at same time, the expansion is done simultaneously using the first value of each argument, then the second value, and so on. Hence, the number of expanded files is the same as the length of values passed to a keyword argument. In this case, all keywords arguments must have the same length of values.

By calling the expand() method multiple times with different sets of values for the placeholders, you take a different approach. Instead of simultaneous expansion, this method focuses on exploring all possible combinations of values for each placeholder. In other words, each time you invoke expand(), it systematically combines the values you provide for the placeholders to create a unique set of filenames for the dataset. Once all placeholders in the filename templates are expanded, you arrive at the list of unique files within the dataset. At this point, further expansion is no longer possible.

The expand() method is a versatile tool for customizing filenames in the dataset and to associate properties to each file. All expanded placeholder becomes a property in the file object. A list of all properties can be seen by properties class member.

Parameters:

**kwargs – Keyword arguments that represent the values used to replace placeholders in the file template. Each keyword argument should correspond to a placeholder in the file template, and the value should be a list of replacement values. When providing multiple keyword arguments, all of them must have the same number of elements to ensure consistent replacement across all placeholders.

Raises:
  • ValueError – If there are no placeholders to be expanded or if a placeholder is not found.

  • ValueError – If input lists have different lengths.

Returns:

The expanded LocalFiles object.

Return type:

LocalFiles

Examples

  1. Simultaneously Expanding Placeholders:

>>> from zen import LocalFiles
>>> template = 'file{index}_{year}.csv'
>>> ds = LocalFiles.from_template(template)
>>> ds.expand(index=[10, 20], year=['1990', '2000'])
>>> # Multiple files: 'file10_1990.csv', 'file20_2000.csv', ...
>>> len(ds)
2
  1. Sequentially Expanding Placeholders:

>>> ds = LocalFiles.from_template(template)
>>> ds.expand(index=[10, 20])
>>> # Multiple files: 'file10_{year}.csv', 'file20_{year}.csv'
>>> # applies each value to each previously expanded list.
>>> ds.expand(year=['1990', '2000'])
>>> # Multiple files: 'file10_1990.csv', 'file20_1990.csv',
>>> #     'file10_2000.csv', 'file20_2000.csv'
>>> len(ds)
4
property filenames: List[str]

List with all file names in the dataset.

filter(fn_filter: Callable[[LocalFile], bool]) LocalFiles

Filter Files Based on Custom Criteria.

This method filters the files in the dataset based on a filter function (fn_filter). Any modifications made to the files using the fn_filter function will not have a side effect on the resulting dataset.

The filter() method allows you to filter files in the dataset based on custom criteria. The filter function receives as argument the files to be tested from the current object. If the function returns True, the file will be in the returned dataset, otherwise the file will be removed from the returned dataset.

Parameters:

fn_filter (Callable[[LocalFile], bool]) – A matching criteria function that determines whether a file matches the filter or not. The function must be able to receive one argument, a file from the object being filtered, and return a boolean.

Returns:

A copy of the dataset with the filtered files.

Return type:

LocalFiles

Example

>>> from zen import LocalFiles
>>> def custom_filter(file):
...     file.update_metadata()
...     return file.filesize > 34250  # Only keep files larger than ~34 KB.
...
>>> ds = LocalFiles(['examples/file1.csv', 'examples/file2.csv'])
>>> filtered_ds = ds.filter(custom_filter)
>>> print(len(filtered_ds))
1
classmethod from_file(dataset_path: str) LocalFiles

Loads the dataset metadata.

Loads the deposition and the local files metadata on a local file.

Parameters:

dataset_path (str) – The filename to read the dataset.

Returns:

A new object with the dataset loaded from file.

Return type:

LocalFiles

Example

>>> ds = LocalFiles.from_file('examples/dataset.json')
>>> ds.filenames
classmethod from_template(template: str, dataset_path: str | None = None) Self

Create a New Dataset Based on a File Name Template.

This method imports files to the current dataset based on a provided file name template. The template is expanded into a list of files, and these files are associated with the current dataset.

The from_template() method allows you to create a dataset using a file name template. This template is used to generate a list of files that are then associated with the current dataset.

Placeholders within the template, enclosed in curly braces (e.g., ‘{index}’), are replaced with specific values, allowing you to create a set of files with structured names. For example, if the template is ‘file{index}.csv’, expanding it, for example, calling expand(index=list(range(12))) would result in file names like ‘file0.csv’, ‘file1.csv’, … and ‘file11.csv’. After expansion, each expanded placeholder value is stored in ‘properties’ key entry of each file.

Parameters:
  • template (str) – The file name template to be expanded as a list of files.

  • dataset_path (Optional[str]=None) – The filename of the dataset.

Returns:

The dataset with the template.

Return type:

LocalFiles

Example

>>> ds = LocalFiles.from_template('file{index}_{year}.csv')
>>> print(ds.placeholders)
{'index', 'year'}
>>> ds.expand(index=[10, 20, 30])
>>> ds.expand(year=['2019', '2020'])
>>> print(len(ds))
6
merge(other: _FileDataset, remove_unmatched: bool = False) Self

Merge Files into the Current Dataset.

This method merges files from another dataset into the current dataset. You can choose to remove unmatched files from the current dataset.

The merge() method allows you to merge files from another dataset (other) into the current dataset. If a file with the same name already exists in the current dataset, it will be updated with the data from the other dataset.

You can choose to remove unmatched files from the current dataset by setting remove_unmatched to True. This means that any files in the current dataset that do not exist in the other dataset will be removed.

If there are placeholders in the dataset that have not been evaluated, merging will not be allowed. Use the expand() function to expand placeholders before merging.

Parameters:
  • other (_FileDataset) – The _FileDataset object from witch files have to be imported.

  • remove_unmatched (bool=False) – Should the unmatched files in the current dataset be removed?

Returns:

The dataset with the new imported/updated/deleted files.

Return type:

LocalFiles

Example

>>> ds = LocalFiles(['examples/file1.csv', 'examples/file2.csv'])
>>> other_ds = LocalFiles(['examples/file2.csv', 'examples/file3.csv'])
>>> ds.merge(other_ds, remove_unmatched=False)
>>> print(len(ds))
3
modify_url(fn_modifier: Callable[[str], str] | None = None, prefix: str | None = None, suffix: str | None = None) Self

Modify URLs in the dataset using various methods.

You can modify URLs in one of the following ways:

  • Using a modifier function: Pass a function (fn_modifier) that takes a URL as input and returns a modified URL.

  • Adding a prefix: Use the prefix parameter to insert a common prefix at the beginning of each URL.

  • Adding a suffix: Utilize the suffix parameter to append a common suffix at the end of each URL.

Parameters:
  • fn_modifier (Optional[Callable[[str],str]]=None) – A function to modify URLs. If not provided, URLs remain unchanged.

  • prefix (Optional[str]=None) – The prefix to insert at the beginning of URLs.

  • suffix (Optional[str]=None) – The suffix to insert at the end of URLs.

Returns:

The current LocalFiles with updated URLs based on the

specified modifications.

Return type:

LocalFiles

Examples

>>> from zen import LocalFiles
>>> ds = LocalFiles(['file1.csv', 'file2.csv'])
>>> ds.modify_url(prefix='http://example.com/')

This adds the prefix ‘https://example.com/’ to the beginning of all URLs in the dataset.

>>> def custom_modifier(url):
...     return url.replace('http://', 'https://')
...
>>> ds.modify_url(fn_modifier=custom_modifier)

This example modifies all URLs in the dataset by replacing ‘http://’ with ‘https://’.

>>> ds.modify_url(suffix='/download')

This appends the suffix ‘/download’ to the end of all URLs in the dataset.

property placeholders: Set[str]

List of the placeholders given in the file name template.

property properties: Set[str]

List with all properties present in the dataset’s files.

remove(index: int)

Remove Files in the Current Dataset.

This method allows you to remove files from the current dataset.

Parameters:

index (int) – The index of the file to be removed.

Returns:

The dataset with the removed file.

Return type:

LocalFiles

Example

>>> ds = LocalFiles(['examples/file1.csv', 'examples/file3.csv'])
>>> ds.remove(1)
>>> print(len(ds))
1
save(file: str | None = None) LocalFiles

Saves the dataset metadata.

Saves the deposition and the local files metadata on a local file to be loaded later using LocalFiles.from_file() class method.

Parameters:

file (Optional[str]=None) – The filename to save the dataset.

Returns:

The current dataset object.

Return type:

LocalFiles

Examples

>>> from zen import LocalFiles, Zenodo
>>> local_file_paths = ['examples/file1.csv', 'examples/file2.csv']
>>> ds = LocalFiles(local_file_paths, dataset_path='examples/dataset.json')
>>> ds.save()
set_dataset_path(dataset_path: str)

Set the dataset file path.

Set the dataset JSON file to persist the dataset’s metadata.

Parameters:

dataset_path (str) – The filename of the dataset.

Returns:

The current dataset with the updated dataset file path.

Return type:

LocalFiles

set_deposition(api: Zenodo, metadata: Metadata | Dict[str, Any] | None = None, deposition: Deposition | Dict[str, Any] | int | None = None, create_if_not_exists: bool = False) Deposition

Set the dataset deposition.

Set the deposition of the dataset. If the dataset has no linked deposition, the deposition parameter is None and the create_if_not_exists parameter is True, it creates a new deposition. The linked deposition is saved into dataset file.

Parameters:
  • api (Zenodo) – Zenodo object to access Zenodo’s API.

  • metadata (Optional[Union[MetaGeneric,Dict[str,Any]]]=None) – The metadata for the new deposition. Ignored if deposition is informed or the dataset already has a deposition.

  • deposition (Optional[Union[Deposition,Dict[str,Any],int]]=None) – An existing deposition to bind with the current dataset.

  • create_if_not_exists (bool=False) – If there is no deposition linked to the current dataset, it creates a new deposition on Zenodo. Ignored if deposition parameter is informed.

Returns:

The current dataset with the updated deposition.

Return type:

LocalFiles

Raises:
  • ValueError – If the dataset already has an deposition and a different deposition has been passed to the function.

  • ValueError – If the dataset does not have a deposition, no deposition has been passed to deposition parameter, and parameter create_if_not_exists is False.

property storage_size: int

Calculates the total data size of the files.

summary(properties: List[str] | None = None, **kwargs)

Summarizes the file properties of the current dataset

To list all properties present in the dataset’s files, use properties class member. This method can be used to generated personalized metadata description of the dataset.

Parameters:
  • properties (Optional[List[str]]=None) – A list of properties to be summarized.

  • **kwargs – Alternative summarizing functions to generate the resulting dictionary. By default, min and max values are computed. The name of summarizing function becomes part of the name of the summarized property value.

Returns:

The dictionary with summarized values by property and function.

Return type:

dict

Examples

  1. Generate file list and summary

>>> from zen import LocalFiles
>>> ds = LocalFiles.from_template('file{index}.csv')
>>> ds.expand(index=[1, 2, 3])
>>> ds.summary()
{'index_min': '1', 'index_max': '3'}
  1. Create a templated metadata and render personalized description

>>> from zen.metadata import Dataset
>>> meta = Dataset(
...     title='My title',
...     description='Dataset index from {index_min} to {index_max}'
... )
>>> meta.render(ds.summary())
{'upload_type': 'dataset',
 'title': 'My title',
 'description': 'Dataset index from 1 to 3',
 'prereserve_doi': True}
update_metadata() LocalFile

Update the metadata for all files in the dataset.

Iterates over all files in the dataset and updates their metadata. This ensures that each file’s metadata (e.g. size and date) is current based on the latest available information.

Returns:

The current dataset with updated metadata for each file.

Return type:

LocalFile

upload(progress: bool = True, force: bool = False, max_retries: int = 15, min_delay: int = 10, max_delay: int = 60) None

Upload files to a Zenodo deposition and update files’ metadata.

This method enables you to upload files to a Zenodo deposition, ensuring that their metadata is up-to-date.

Parameters:
  • progress (bool=True) – Whether to display a progress bar. Defaults to True.

  • force (bool=False) – Should all files be uploaded regardless they already been uploaded or not?

  • max_retries (int, optional) – Maximum number of retry attempts if the upload fails. Defaults to 15.

  • min_delay (int, optional) – Minimum delay in seconds between retry attempts. Defaults to 10 seconds.

  • max_delay (int, optional) – Maximum delay in seconds between retry attempts. Defaults to 60 seconds.

Returns:

None

Examples

  1. Upload files to a Zenodo deposition:

>>> from zen import LocalFiles, Zenodo
>>> ds = LocalFiles.from_file('examples/dataset.json)
>>> ds.set_deposition(Zenodo.sandbox_url, token='your_api_token')
>>> ds.upload()

This example uploads files to the specified Zenodo deposition (my_deposition) and shows a progress bar during the upload. It won’t re-upload files that already exist in the deposition.

  1. Forcefully re-upload all files to a Zenodo deposition:

>>> ds.upload(force=True)

In this case, the force parameter is set to True, which ensures that all files are re-uploaded, even if they exist in the deposition. A progress bar is displayed during the process.

class zen.dataset.ZenodoFile(file: Dict[str, Any], deposition: Deposition)

Bases: BaseFile

Represents a File Associated with a Zenodo Deposition.

This class is designed to represent a file that is associated with a Zenodo deposition. It extends the functionality of the BaseFile class and provides methods for refreshing file data from Zenodo, deleting the file from Zenodo, and retrieving the checksum of the file.

The ZenodoFile class is tailored for files that are part of a Zenodo deposition. It is instantiated with a dictionary representing file information and the corresponding Zenodo deposition to which the file is linked.

This class offers methods to interact with Zenodo’s API, allowing you to refresh file data, delete files from a Zenodo deposition, and retrieve checksums.

Parameters:
  • file (Dict[str,Any]) – The file dictionary.

  • deposition (Deposition) – The Zenodo deposition to which this Zenodo file belongs.

Example

  1. Create a new Zenodo deposition and upload a file:

>>> from zen import Zenodo
>>> zen = Zenodo(url=Zenodo.sandbox_url, token='your_api_token')
>>> dep = zen.depositions.create()
>>> dep.files.create('examples/file1.csv')
  1. Download a file from deposition to current directory

>>> zenodo_file = dep.files[0]
>>> zenodo_file.download()

Discard the deposition example.

>>> dep.discard()
property checksum: str

The checksum of the Zenodo file.

delete() None

Delete the file from Zenodo.

download(dirname: str | None = None) str

Downloads a file to a local directory.

Parameters:

dirname (Optional[str]=None) – The local path to download the file (default to current directory).

Returns:

The path of downloaded file.

Return type:

str

Raises:

ValueError – If directory does not exist or file has not a valid download link.

refresh() Self

Refresh the file data from Zenodo.