tdw_catalog

class tdw_catalog.Catalog(*args, **kwargs)[source]

A Catalog is the primary client object for a ThinkData Catalog

Parameters

api_keyOptional[str])

An optional api key for the Catalog platform. This parameter must be supplied, containing your personal API key for the Catalog platform, and can only be omitted when supplied via an environment variable CATALOG_API_KEY instead.

auth_urlOptional[str])

An optional auth url for the Catalog platform. This parameter must only be supplied when connecting to a dedicated Catalog deployment, and can be populated via an envrionment variable CATALOG_AUTH_URL instead. Defaults to the auth url for the ThinkData Works SaaS Catalog platform (https://account.ee.namara.io).

api_urlOptional[str])

An optional API url for the Catalog platform. This parameter must only be supplied when connecting to a dedicated Catalog deployment, and can be populated via an envrionment variable CATALOG_API_URL instead. Defaults to the API url for the ThinkData Works SaaS Catalog platform (https://api.ee.namara.io).

create_organization(title: str) organization.Organization[source]

Creates an Organization

Parameters

titlestr

The title for the new Organization

Returns

Organization

The created Organization

Raises

CatalogException

If there is an issue communicating with the Catalog server, or an issue with the server itself

get_organization(id: str) organization.Organization[source]

Retrieve a specific Organization

Parameters

idstr

The UUID of the Organization

Returns

Organization

The Organization which has the provided id, if it exists and the caller is a member of it

Raises

CatalogPermissionDeniedException

If the caller is not a member of the given Organization, or if it does not exist

CatalogException

If there is an issue communicating with the Catalog server, or an issue with the server itself

list_organizations(filter: ListOrganizationsFilter | None = None) List[organization.Organization][source]

Retrieve the list of Organizations to which the caller belongs.

Parameters

filterListOrganizationsFilter

An optional filter on the returned Organizations (None by default).

Returns

list[Organization]

The list of Organizations to which the caller belongs, ordered by title (ascending).

Raises

CatalogException

If there is an issue communicating with the Catalog server, or an issue with the server itself

connection

class tdw_catalog.connection.ConnectionSchedule(interval: HourlyInterval | DailyInterval | WeeklyInterval | MonthlyInterval | YearlyInterval, timezone: str)[source]

Bases: object

A ConnectionSchedule describes the frequency with which to reingest ingested data, or re-analyze virtualized data

Attributes

interval: HourlyInterval | DailyInterval | WeeklyInterval | MonthlyInterval | YearlyInterval

The interval that this schedule represents

timezone: str

The timezone in which to interpret times in the interval

class tdw_catalog.connection.DailyInterval(minute: int, hour: int)[source]

Bases: HourlyInterval

A DailyInterval interval causes a ConnectionSchedule to execute at a specific minute and hour each day

Attributes

minuteint

The minute of the hour to execute at, between 0 and 59

hourint

The hour of the day to execute at, between 0 and 23

class tdw_catalog.connection.HourlyInterval(minute: int)[source]

Bases: object

An hourly interval causes a ConnectionSchedule to execute at a specific minute every hour.

Attributes

minuteint

The minute of the hour to execute at, between 0 and 59

class tdw_catalog.connection.IngestionConnection(client, **kwargs)[source]

Bases: _Connection

IngestionConnections are used to attach ingested data to a Dataset, describing the mechanism and necessary credentials for accessing said data. Data is ingested via an IngestionConnection: pulled from an uploaded file, or a remote location such as a cloud storage bucket.

Attributes

idstr

IngestionConnection‘s unique id

source_idstr

The unique ID of the Source to which this IngestionConnection belongs

sourceSource

The Source associated with this IngestionConnection. A Source or source_id can be provided but not both.

user_idstr

The unique User ID of the user who created this IngestionConnection

labelstr

The descriptive label for this IngestionConnection

descriptionOptional[str] = None

An optional extended description for this IngestionConnection

portalConnectionPortalType

The method of data access employed by this IngestionConnection

urlstr

A canonical URL that points to the location of data resources within the portal

warehouseOptional[str]

Datasets created using this IngestionConnection will ingest to this Warehouse by default (can be overriden at ingest time).

credential_idOptional[str]

The Credential ID that should be used along with the portal to access Datasets when ingesting.

credentialOptional[credential.Credential]

The Credential associated with this IngestionConnection. Omitted when virtualizing. A Credential or credential_id can be provided but not both.

ingest_schedulesOptional[List[ConnectionSchedule]]

Optional ConnectionSchedules which, when specified, indicate the frequency with which to reingest ingested data. Specific Datasets using this IngestionConnection may override this set of ConnectionSchedules.

disabledOptional[bool]

When true, disables the schedule on this IngestionConnection. The IngestionConnection itself can still be used for manual ingestion or data virtualization.

created_atdatetime

The datetime at which this IngestionConnection was created

updated_atdatetime

The datetime at which this IngestionConnection was last updated

class tdw_catalog.connection.MonthlyInterval(minute: int, hour: int, dayOfMonth: int)[source]

Bases: DailyInterval

A MonthlyInterval interval causes a ConnectionSchedule to execute on a specific day of the month, at a specific minute+hour, every month.

Attributes

minuteint

The minute of the hour to execute at, between 0 and 59

hourint

The hour of the day to execute at, between 0 and 23

dayOfMonthint

The day of the week to execute at, beginning on Sunday, between 1 and 31, or “-1” for the last day of each month

class tdw_catalog.connection.VirtualizationConnection(client, **kwargs)[source]

Bases: _Connection

VirtualizationConnections are used to attach virtualized data to a Dataset, describing the mechanism and necessary credentials for accessing said data. Data is accessed from a remote location without being copied into the platform.

Attributes

idstr

IngestionConnection‘s unique id

source_idstr

The unique ID of the Source to which this IngestionConnection belongs

sourceSource

The Source associated with this IngestionConnection. A Source or source_id can be provided but not both.

user_idstr

The unique User ID of the user who created this IngestionConnection

labelstr

The descriptive label for this IngestionConnection

descriptionOptional[str] = None

An optional extended description for this IngestionConnection

portalConnectionPortalType

The method of data access employed by this IngestionConnection

urlstr

A canonical URL that points to the location of data resources within the portal

warehouseOptional[str]

Virtualized datasets created using this IngestionConnection will always access data from this Warehouse (must be suplied for virtualization). Non-virtualized datasets created using this IngestionConnection will ingest to this Warehouse by default (can be overriden at ingest time).

credential_idOptional[str]

The Credential ID that should be used along with the portal to access Datasets when ingesting. Omitted when virtualizing.

credentialOptional[credential.Credential]

The Credential associated with this IngestionConnection. Omitted when virtualizing. A Credential or credential_id can be provided but not both.

default_schema: str

The schema to search for tables and views

metrics_collection_schedulesOptional[List[ConnectionSchedule]]

Optional ConnectionSchedules which, when specified, indicate the frequency with which to re-analyze virtualized data. Specific Datasets using this VirtualizationConnection may override this set of ConnectionSchedules.

disabledOptional[bool]

When true, disables the schedule on this IngestionConnection. The IngestionConnection itself can still be used for manual ingestion or data virtualization.

created_atdatetime

The datetime at which this IngestionConnection was created

updated_atdatetime

The datetime at which this IngestionConnection was last updated

class tdw_catalog.connection.WeeklyInterval(minute: int, hour: int, dayOfWeek: int)[source]

Bases: DailyInterval

A WeelyInterval interval causes a ConnectionSchedule to execute on a specific day of the week, at a specific minute+hour, every week.

Attributes

minuteint

The minute of the hour to execute at, between 0 and 59

hourint

The hour of the day to execute at, between 0 and 23

dayOfWeek: int

The day of the week beginning on Sunday, between 0 and 6

class tdw_catalog.connection.YearlyInterval(minute: int, hour: int, dayOfMonth: int, month: int)[source]

Bases: MonthlyInterval

A MonthlyInterval interval causes a ConnectionSchedule to execute on a specific day of a specific month, at a specific minute+hour, every year.

Attributes

minuteint

The minute of the hour to execute at, between 0 and 59

hourint

The hour of the day to execute at, between 0 and 23

dayOfMonthint

The day of the week to execute at, beginning on Sunday, between 1 and 31, or “-1” for the last day of each month

monthint

The month of the year to execute at, between 1 and 12

credential

class tdw_catalog.credential.CatalogCredential(client, **kwargs)[source]

Bases: Credential

A CatalogCredential permits a Source to access datasets which exist on another ThinkData Works Catalog server.

Attributes

catalog_api_keystr

The API key for the target Catalog. Can be updated, but not read.

class tdw_catalog.credential.Credential(client, **kwargs)[source]

Bases: EntityBase, _OrganizationRelation

Credentials are used in conjunction with Sources to ingest data into Datasets

Parameters

idstr

Credential‘s unique id

organization_idstr

The unique ID of the Organization to which this Credential belongs

user_idstr

The unique user ID of the user who created this Credential

namestr

A name for this Credential

descriptionstr

The Optional description of this Credential

created_atdatetime

The datetime at which this Credential was created

updated_atdatetime

The datetime at which this Credential was last updated

delete() None[source]

Delete this Credential from the user. This Credential object should not be used after delete() returns successfully.

Parameters

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this Credential

CatalogInvalidArgumentException

If the given Credential does not exist

CatalogException

If call to the Catalog server fails

classmethod get(client: Catalog, organization_id: str, id: str)[source]

Retrieve a Credential belonging to an Organization

Parameters

clientcatalog.Client

The Catalog client to use to get the Credential

organization_idstr

The unique ID of the Organization

idstr

The unique ID of the Credential

Returns

Credential

The Credential associated with the given ID

save() None[source]

Update this Credential, saving any changes to its name, description or type-specific fields.

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this Credential

CatalogException

If call to the Catalog server fails

class tdw_catalog.credential.CredentialFactory(client: Catalog, organization_id: str)[source]

Bases: object

A CredentialFactory creates specific types of Credentials within a specific Organization

catalog_credential(name: str, description: str | None, catalog_api_key: str) CatalogCredential[source]

Constructs a CatalogCredential

Parameters

namestr

A name for this Credential

descriptionOptional[str]

The Optional description of this Credential

catalog_api_keystr

The API key for the target Catalog

Returns

CatalogCredential

The created CatalogCredential

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Credentials

CatalogInvalidArgumentException

If one or more of the given credential parameters are invalid

CatalogException

If call to the Catalog server fails

ftp_credential(name: str, description: str | None, username: str, password: str) FTPCredential[source]

Constructs an FTPCredential

Parameters

namestr

A name for this Credential

descriptionOptional[str]

The Optional description of this Credential

username: str

The username for the target FTP server

password: str

The password for the target FTP server

Returns

FTPCredential

The created FTPCredential

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Credentials

CatalogInvalidArgumentException

If one or more of the given credential parameters are invalid

CatalogException

If call to the Catalog server fails

google_storage_credential(name: str, description: str | None, region: str, project: str, client_secrets: str) GoogleStorageCredential[source]

Constructs a GoogleStorageCredential

Parameters

namestr

A name for this Credential

descriptionOptional[str]

The Optional description of this Credential

projectstr

The name of the Google Cloud project in which the bucket can be found

regionstr

The Google Cloud region in which the bucket can be found (e.g. us-central1)

client_secretsstr

The client secrets for the Google Storage bucket. Can be updated, but not read.

Returns

GoogleStorageCredential

The created GoogleStorageCredential

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Credentials

CatalogInvalidArgumentException

If one or more of the given credential parameters are invalid

CatalogException

If call to the Catalog server fails

s3_credential(name: str, description: str | None, region: str, access_key_id: str, secret_access_key: str) S3Credential[source]

Constructs as S3Credential

Parameters

namestr

A name for this Credential

descriptionOptional[str]

The Optional description of this Credential

regionstr

The AWS S3 region in which the bucket resides

access_key_idstr

The AWS Access Key for the S3 bucket. Can be updated but not read.

secret_access_keystr

The AWS Secret Access Key for the S3 bucket. Can be updated but not read.

Returns

S3Credential

The created S3Credential

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Credentials

CatalogInvalidArgumentException

If one or more of the given credential parameters are invalid

CatalogException

If call to the Catalog server fails

sftp_with_key_credential(name: str, description: str | None, username: str, ssh_key: str) SFTPCredential[source]

Constructs a key-based SFTPCredential

Parameters

namestr

A name for this Credential

descriptionOptional[str]

The Optional description of this Credential

username: str

The username for the target SFTP server

ssh_key: str

The ssh_key for the target SFTP server

Returns

SFTPCredential

The created SFTPCredential

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Credentials

CatalogInvalidArgumentException

If one or more of the given credential parameters are invalid

CatalogException

If call to the Catalog server fails

sftp_with_password_credential(name: str, description: str | None, username: str, password: str) SFTPCredential[source]

Constructs a password-based SFTPCredential

Parameters

namestr

A name for this Credential

descriptionOptional[str]

The Optional description of this Credential

username: str

The username for the target SFTP server

password: str

The password for the target SFTP server

Returns

SFTPCredential

The created SFTPCredential

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Credentials

CatalogInvalidArgumentException

If one or more of the given credential parameters are invalid

CatalogException

If call to the Catalog server fails

class tdw_catalog.credential.FTPCredential(client, **kwargs)[source]

Bases: Credential

An FTPCredential permits a Source to access data stored on an FTP server.

Attributes

username: str

The username for the target FTP server

password: str

The password for the target FTP server. Can be updated, but not read.

class tdw_catalog.credential.GoogleStorageCredential(client, **kwargs)[source]

Bases: Credential

A GoogleStorageCredential permits a Source to access data stored in a Google Storage (GS) bucket.

Attributes

projectstr

The name of the Google Cloud project in which the bucket can be found

regionstr

The Google Cloud region in which the bucket can be found (e.g. us-central1)

client_secretsstr

The client secrets for the Google Storage bucket. Can be updated, but not read.

class tdw_catalog.credential.S3Credential(client, **kwargs)[source]

Bases: Credential

An S3Credential permits a Source to access data stored in an AWS S3 (or other S3-compatible) bucket.

Attributes

regionstr

The AWS S3 region in which the bucket resides

access_key_idstr

The AWS Access Key for the S3 bucket. Can be updated but not read.

secret_access_keystr

The AWS Secret Access Key for the S3 bucket. Can be updated but not read.

class tdw_catalog.credential.SFTPCredential(client, **kwargs)[source]

Bases: Credential

An SFTPCredential permits a Source to access data stored on an SFTP server.

Attributes

username: str

The username for the target FTP server

password: str

The password for the target FTP server. Can be updated, but not read.

ssh_key: Optional[str]

The ssh key for the target SFTP server. Either ssh_key or password must be set. Can be updated, but not read.

dataset

class tdw_catalog.dataset.ConnectedDataset(client, **kwargs)[source]

Bases: Dataset

A ConnectedDataset is identical to a Dataset and inherits all of its fields, but represents a Dataset which is connected to the actual underlying data asset via a Connection. A ConnectedDataset supports queries, export, health monitoring, etc.

Attributes

exports_disabled: bool

A flag to mark if this Dataset may be exported. Setting this to false does not prevent querying on this Dataset. Only relevant if the Dataset is connected to data.

warehouse: str

The underlying data warehouse where that data resides

metrics_last_collected_at: datetime

The last time metrics were collected for this Dataset (virtualized Datasets) or the last time the Dataset was imported (ingested Datasets).

next_scheduled_metrics_collection_time: Optional[datetime]

If this Dataset has an associated connection schedule, the next time this dataset will collect metrics (virtualized Dataset) or import (ingested Datasets).

last_metrics_collection_failure_time: datetime

The most recent time metrics collection (virtualized Datasets) or import (ingested Datasets) failed. None if metrics collection has never failed.

warehouse_metadata: Optional[List[metadata_field.MetadataField]]

Harvested metadata from virtualized Datasets. None for ingested Datasets.

property advanced_configuration: str

This configuration string is auto-generated during ingest, or when virtualization, inferred from the connected data. It can be modified, with caution, to alter how the Catalog perceives and represents the connected data.

Modification of this configuration without support from ThinkData Works is not recommended.

connect() DatasetConnector[source]

Manage all connection-related aspects of this ConnectedDataset.

There are many methods for connecting a Dataset, thus a helper object is returned with various method-based workflows that aid in connecting to data.

Returns

DatasetConnector

A helper object for configuring this Dataset‘s connection to data.

property connection: IngestionConnection | VirtualizationConnection

“The underlying IngestionConnection or VirtualizationConnection which links this Dataset to data

property connection_id: str

“The ID of the underlying IngestionConnection or Virtualization which links this Dataset to data

async export_csv(query: str | None = None) CSVExport[source]

Async function which returns the URL which can be used to stream a CSV-formatted copy of the connected data, optionally filtered by the supplied SQL-like NiQL query. Note that most standard SQL keywords are supported, but keywords which modify underlying data (e.g. INSERT, UPDATE, DELETE) are not.

To refer to the current dataset in the query, include {this} in the query, such as: "SELECT * FROM {this}".

Unlike ConnectedDataset.query(), there is no limit on exported rows, other than any imposed by the underlying warehouse.

Parameters

queryOptional[str]

A NiQL query used to filter or reshape the data before exporting

Returns

CSVExport

An CSVExport object containing a signed download URL which can be used to fetch the exported data. It can be downloaded in its entirety, or streamed in chunks. This CSVExport object improves the usability of the CSV data when employing pandas, including a configuration for read_csv which can be passed via **export as follows: df = pd.read_csv(export.url, **export), ensuring that the resultant DataFrame has the correct schema for all fields (including dates). Note: Is is recommended that export_parquet be employed for use with pandas when supported by the underlying warehouse.

Raises

CatalogPermissionDeniedException

If the caller is not allowed to export data

CatalogInvalidArgumentException

If the given query is invalid

CatalogException

If call to the Catalog server fails, or the export process itself fails

async export_parquet(query: str | None = None) ParquetExport[source]

Async function which returns the URL which can be used to stream a Parquet-formatted copy of the connected data, optionally filtered by the supplied SQL-like NiQL query. Note that most standard SQL keywords are supported, but keywords which modify underlying data (e.g. INSERT, UPDATE, DELETE) are not.

To refer to the current dataset in the query, include {this} in the query, such as: "SELECT * FROM {this}".

Unlike ConnectedDataset.query(), there is no limit on exported rows, other than any imposed by the underlying warehouse.

Note: Parquet export is not (yet) supported for all underlying warehouse types, but this export method should be preferred when interfacing with pandas whenever possible.

Parameters

queryOptional[str]

A NiQL query used to filter or reshape the data before exporting

Returns

ParquetExport

An ParquetExport object containing a signed download URL which can be used to fetch the exported data. It can be downloaded in its entirety, or streamed in chunks. This ParquetExport object can be directly employed by pandas as follows: df = pd.read_parquet(export.url). Note that pandas requires pyarrow OR fastparquet in order to read_parquet. Note: Is is recommended that export_parquet be employed for use with pandas when supported by the underlying warehouse.

Raises

CatalogPermissionDeniedException

If the caller is not allowed to export data

CatalogInvalidArgumentException

If the given query is invalid, or if Parquet export is not available for this warehouse type

CatalogException

If call to the Catalog server fails, or the export process itself fails

property health_monitoring_enabled: bool

Whether or not Catalog platform health monitoring is enabled for this ConnectedDataset

property metrics_collection_schedules: List[ConnectionSchedule] | None

Returns all configured schedules for metrics collection, which govern health monitoring intervals and ingestion intervals for ingested Datasets

async query(query: str | None = None) QueryCursor[source]

Async function which returns a Python DB API-style Cursor object (PEP 249), representing the results of the supplied SQL-like NiQL query executed against the connected data.

Note that NIQL supports most standard SQL keywords, but keywords which modify underlying data (e.g. INSERT, UPDATE, DELETE) may not be used.

Note that the Catalog platform supports a global limit on results (10,000 rows) from a single query.

To refer to the current dataset in the query, include {this} in the query, such as: "SELECT * FROM {this}".

Parameters

queryOptional[str]

A NiQL query used to filter or reshape the data before exporting

Returns

QueryCursor

The query results cursor, which can be printed, converted to a pandas DataFrame via pd.DataFrame(res.fetchall()), etc.

Raises

CatalogPermissionDeniedException

If the caller is not allowed to query data

CatalogInvalidArgumentException

If the given query is invalid

CatalogException

If call to the Catalog server fails, or the export process itself fails

async reconnect()[source]

Manually triggers a reimport of ingested data for ingested datasets, and metrics collection (health monitoring, etc.) for virtualized and ingested datasets.

Useful for forcing a metrics collection, or applying changes made to the advanced_configuration.

refresh() ConnectedDataset[source]

Return a fresh copy of this ConnectedDataset, with up-to-date property values. Useful after performing an update, connection, etc.

property set_advanced_configuration: str

This configuration string is auto-generated during ingest, or when virtualization, inferred from the connected data. It can be modified, with caution, to alter how the Catalog perceives and represents the connected data.

Modification of this configuration without support from ThinkData Works is not recommended.

property set_health_monitoring_enabled: bool

Whether or not Catalog platform health monitoring is enabled for this ConnectedDataset

property set_metrics_collection_schedules: List[ConnectionSchedule] | None

Returns all configured schedules for metrics collection, which govern health monitoring intervals and ingestion intervals for ingested Datasets

property warehouse: Warehouse

The Warehouse where the connected data is virtualized from, or ingested to

class tdw_catalog.dataset.Dataset(client, **kwargs)[source]

Bases: EntityBase, _OrganizationRelation, _SourceRelation

A Dataset represents a cataloged data asset within an Organization. It is a container for structured and custom metadata describing the asset, and can optionally be connected to the data asset via a IngestionConnection or VirtualizationConnection to support queries, health monitoring, etc.

Attributes

id: str

The Dataset’s unique ID

title: str

The title of the Dataset

description: Optional[str]

The full description text (supports Markdown) that helps describe this Dataset

uploader_id: str

The unique ID of the User that created this Dataset

source_id: str

The unique ID of the Source associated with this Dataset

source: str

The Source associated with this Dataset

organization_id: str

The unique ID of the Organization which this Dataset belongs to

organization: Organization

The Organization which this Dataset belongs to

metadata_template: MetadataTemplate

The MetadataTemplate attached to this Dataset, if any

data_dictionary: DataDictionary

The DataDictionary defined within this Dataset, or describing the schema of the connected data if this is a ConnectedDataset

created_at: datetime

The date this Dataset was originally created

updated_at: datetime

The date this Dataset‘s metadata was last modified

attach_template(template: MetadataTemplate)[source]

Attach a MetadataTemplate to this Dataset. Values may be supplied to templated fields immediately, but the template will only be attached when class:.Dataset .save() is called.

Parameters

templateMetadataTemplate

The MetadataTemplate to be attached to the Dataset

Returns

Dataset

The Dataset with a newly attached MetadataTemplate

classify(topic: Topic) None[source]

Classify this Dataset with a Topic, linking them semantically

Parameters

topicTopic

The Topic to classify this Dataset with

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to classify Datasets, or if the Topic ID provided does not correspond to an existing Topic

CatalogException

If call to the Catalog server fails

connect() DatasetConnector[source]

Converts a Dataset into a ConnectedDataset, by accessing data via an IngestionConnection or VirtualizationConnection. A ConnectedDataset can represent ingested data, which is copied into the Catalog platform, or virtualized data which is accessed remotely by the platform without being copied.

There are many methods for connecting a Dataset, thus a helper object is returned with various method-based workflows that aid in connecting to data.

Returns

DatasetConnector

A helper object for configuring this Dataset‘s connection to data.

property custom_metadata: List[MetadataField]

A list of MetadataFields attached to this Dataset that are not associated with an attached MetadataTemplate

declassify(topic: Topic) None[source]

Remove a Topic classification from this Dataset

Parameters

topicTopic

The Topic to be unclassified from this Dataset

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to declassify Datasets, or if the Topic ID provided does not correspond to an existing Topic

CatalogException

If call to the Catalog server fails

delete() None[source]

Delete this Dataset. The Dataset object should not be used after this method is invoked successfully.

Raises

CatalogPermissionDeniedException

If the caller is not allowed to delete this Dataset

CatalogException

If call to the Catalog server fails

detach_template()[source]

Remove the attached MetadataTemplate from this Dataset. Any fields from this MetadataTemplate will remain on the Dataset but as individual MetadataFields. Detachment happens instantly and calling Dataset.save() is not necessary for the changes to persist

Parameters

None

Returns

Dataset

The Dataset with no attached MetadataTemplate

classmethod get(client: Catalog, id: str, context_organization: organization.Organization | None = None)[source]

Retrieve a Dataset

Parameters

clientCatalog

The Catalog client to use to get the Dataset

idstr

The unique ID of the Dataset

context_organizationOptional[Organization]

The Organization from which this Dataset is being retrieved. Dataset‘s may be accessible from multiple Organization‘s, but can have differing metadata within each. This context parameter is necessary to determine which metadata to load.

Returns

Dataset

The Dataset associated with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve the given Dataset

CatalogNotFoundException

If the given Dataset ID does not exist

CatalogException

If call to the Catalog server fails

list_topics(organization_id: str | None = None, filter: Filter | None = None) List[Topic][source]

Retrieves the list of all Topics this Dataset is currently classified under, within the given Organization

Parameters

organization_idOptional[str]

An optional ID for an Organization other than the original Organization the Dataset was created in (e.g. if the Dataset has been shared to another organization with a different set of Topics)

filterOptional[Filter]

An optional tdw_catalog.utils.Filter to offset or limit the list of Topics returned

Returns

List[Topic]

The list of Topics that have been classified to this Dataset

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list Topics in this Organization

CatalogException

If call to the Catalog server fails

refresh() Dataset[source]

Return a fresh copy of this Dataset, with up-to-date property values. Useful after performing an update, connection, etc.

save() None[source]

Update this Dataset, saving all changes to its metadata fields.

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this Dataset

CatalogException

If call to the Catalog server fails

property templated_metadata: List[MetadataField]

A list of MetadataFields attached to this Dataset that are associated with an attached MetadataTemplate

update_custom_metadata() MetadataEditor[source]

Provides a MetadataEditor which allows for the addition, removal, and alteration of MetadataFields on this Dataset that are not associated with an attached MetadataTemplate

Parameters

None

Returns

MetadataEditor

An editor for adding, removing, and updating MetadataFields on the Dataset which do not belong to a MetadataTemplate

update_templated_metadata() TemplatedMetadataEditor[source]

Provides a TemplatedMetadataEditor which allows for the alteration of MetadataFields on this Dataset that are associated with an attached MetadataTemplate. This object cannot add or remove MetadataFields, that must be done on the MetadataTemplate directly.

Parameters

None

Returns

TemplatedMetadataEditor

An editor for updating MetadataFields on the Dataset that are associated with an attached MetadataTemplate

dataset_connector

class tdw_catalog.dataset_connector.DatasetConnector(d: Dataset | ConnectedDataset)[source]

Bases: object

A helper object for configuring a Dataset‘s connection to data. Can either connect a Dataset for the first time, or reconnect an already-connected Dataset to different data.

async ingest_from_file(local_file_path: str, connection: IngestionConnection | None = None, target_warehouse: TargetWarehouse | None = None) ConnectedDataset[source]

Async function which uploads a local file to the Catalog platform and ingests it, connecting this Dataset to that ingested data.

Parameters

file_pathstr

The path to the file on disk. The file will be streamed from disk, rather than read into memory, to ensure large files upload successfully.

connectionOptional[IngestionConnection]

Optionally specify a file upload-type IngestionConnection for use. This IngestionConnection must reside within the existing Dataset‘s Source, and must be of the correct type (ConnectionPortalType.IMPORT_LITE). If not provided, the first available file upload Connection within the Dataset’s source will be used, or one will be created if none are available.

warehouseOptional[TargetWarehouse]

Optionally specify a target warehouse to ingest to. If omitted, the TargetWarehouse specified by the IngestionConnection will be used, or the default TargetWarehouse for the Organization if the IngestionConnection does not specify a default TargetWarehouse.

Returns

ConnectedDataset

The newly connected Dataset, if it was not connected previously, or an updated version of the existing ConnectedDataset if it was connected previously. Further Dataset operations should be performed on this returned object.

Raises

FileNotFoundError

If the specified file_path does not exist

CatalogPermissionDeniedException

If the caller is not allowed to perform any of the steps involved in ingest data from a file

CatalogInvalidArgumentException

If the given IngestionConnection cannot be used

CatalogException

If call to the Catalog server fails, or the ingest process itself fails

data_dictionary

class tdw_catalog.data_dictionary.Column(key: str = None, type: ColumnType = None, name: str | None = None, description: str | None = None)[source]

Bases: object

A single Column within a DataDictionary

Attributes

keystr

The column name for this Column, within the actual Warehouse where the data lives

typeColumnType

The data type for this Column. Available types can be found in ColumnType.

name: Optional[str]

An optional friendly name for this Column, which is visually used in place of the key throughout the Catalog

description: Optional[str]

An optional description for this Column

apply_glossary_term(glossary_term: glossary_term.GlossaryTerm) None[source]

Apply a GlossaryTerm to this Column. The containing DataDictionary must be saved for the change to take permanent effect.

Parameters

glossary_termGlossaryTerm

The GlossaryTerm to classify this Column with

Returns

None

Raises

CatalogInvalidArgumentException

If the Organization of the GlossaryTerm does not match the Organization which the Dataset was retrieved from.

list_glossary_terms() List[glossary_term.GlossaryTerm][source]

Return a list of GlossaryTerms that have been applied to this Column

Parameters

None

Returns

List[glossary_term.GlossaryTerm]

The list of GlossaryTerms that have been applied to this Column

Raises

CatalogPermissionDeniedException

If the caller does not have permission to list GlossaryTerms on a Dataset‘s Columns

CatalogInternalException

If call to the Catalog server fails

remove_glossary_term(glossary_term: glossary_term.GlossaryTerm) None[source]

Remove a GlossaryTerm from this Column. The containing DataDictionary must be saved for the change to take permanent effect.

Parameters

glossary_termGlossaryTerm

The GlossaryTerm to be removed from this Column

Returns

None

class tdw_catalog.data_dictionary.CurrencyColumn(key: str = None, type: ColumnType = None, name: str | None = None, description: str | None = None, symbol: str | None = None)[source]

Bases: Column

A currency-specific extension of Column, with an added currency symbol (such as $)

Attributes

symbolOptional[str]

An optional currency symbol (e.g. '$')

class tdw_catalog.data_dictionary.DataDictionary(dataset: Dataset, last_updated_at: datetime, version_id: str | None, columns: List[Column])[source]

Bases: object

A DataDictionary describes the schema of data represented by a Dataset as a sequence of Columns, each with a key, title, type, and optional description.

A DataDictionary behaves as a dict - columns can be accessed via their key as follows: data_dictionary["column_name"].

Attributes

last_updated_at: datetime

The last time this DataDictionary was updated, either by hand (for Datasets which are not connected) or via a schedule metrics collection (for ConnectedDatasets which are)

columns: List[Column]

The list of Columns which make up this DataDictionary

columns() List[Column][source]

Returns all Columns in this DataDictionary

has_key(key: str) bool[source]

Returns true if and only if a Column with the given key exists in this DataDictionary

property last_updated_at: datetime

Returns the last time this DataDictionary was modified

save()[source]

Update this DataDictionary, saving all changes to its schema

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this DataDictionary

CatalogException

If call to the Catalog server fails

class tdw_catalog.data_dictionary.MetadataOnlyColumn(key: str = None, type: ColumnType = None, name: str | None = None, description: str | None = None)[source]

Bases: Column

Identical to Column, but within a MetadataOnlyDataDictionary attached to a Dataset which is not connected to data. When not connected, all aspects of a data dictionary can be freely modified (including key and type), as there is no underlying data providing/constraining the dictionary.

Attributes

keystr

The column name for this Column, within the actual Warehouse where the data lives

typeColumnType

The data type for this Column. Available types can be found in ColumnType.

name: str

An optional friendly name for this Column, which is visually used in place of the key throughout the Catalog

description: Optional[str]

An optional description for this Column

class tdw_catalog.data_dictionary.MetadataOnlyCurrencyColumn(key: str = None, type: ColumnType = None, name: str | None = None, description: str | None = None, symbol: str | None = None)[source]

Bases: CurrencyColumn, MetadataOnlyColumn

The MetadataOnlyColumn version of CurrencyColumn

Attributes

symbolOptional[str]

The currency symbol

class tdw_catalog.data_dictionary.MetadataOnlyDataDictionary(dataset: Dataset, last_updated_at: datetime, version_id: str | None, columns: List[Column])[source]

Bases: DataDictionary

A MetadataOnlyDataDictionary is identical to a DataDictionary, but is attached to a Dataset which is not connected to data.

Because the Dataset is not connected, all aspects of the dictionary can be modified freely, including column keys, types, etc. (because they are not constrained by existing underlying data).

A MetaDataOnlyDataDictionary behaves as a dict - columns can be accessed (and overwritten) via their key as follows: data_dictionary["column_name"] = ....

Attributes

last_updated_at: datetime

The last time this DataDictionary was updated, either by hand (for Datasets which are not connected) or via a schedule metrics collection (for ConnectedDatasets which are)

columns: List[MetadataOnlyColumn]

The list of MetadataOnlyColumns which make up this DataDictionary

add(col: Column, index: int | None = None) MetadataOnlyDataDictionary[source]

Appends a specific Column to this MetadataOnlyDataDictionary, or inserts it at a specific index.

Parameters

col: Column

The Column to insert

index: Optional[int]

The optional index to insert the new Column at

Returns

MetadataOnlyDataDictionary

A reference to itself for method chaining

clear() MetadataOnlyDataDictionary[source]

Removes all Columns from this MetadataOnlyDataDictionary

Returns

MetadataOnlyDataDictionary

A reference to itself for method chaining

columns() List[MetadataOnlyColumn][source]

Returns all Columns in this MetadataOnlyDataDictionary

remove(key: str) MetadataOnlyDataDictionary[source]

Removes a specific Column from this MetadataOnlyDataDictionary by key

Parameters

keystr

The key of the Column

Returns

MetadataOnlyDataDictionary

A reference to itself for method chaining

errors

exception tdw_catalog.errors.CatalogAbortedException(*args, message, meta={})[source]

Bases: CatalogException

The operation was aborted, typically due to a concurrency issue like sequencer check failures, transaction aborts, etc.

exception tdw_catalog.errors.CatalogAlreadyExistsException(*args, message, meta={})[source]

Bases: CatalogException

An attempt to create an entity failed because one already exists.

exception tdw_catalog.errors.CatalogBadRouteException(*args, message, meta={})[source]

Bases: CatalogException

The requested URL path wasn’t routable to a known method. This is returned by generated server code and should not be returned by application code (use “not_found” or “unimplemented” instead).

exception tdw_catalog.errors.CatalogCanceledException(*args, message, meta={})[source]

Bases: CatalogException

The operation was cancelled

exception tdw_catalog.errors.CatalogDataLossException(*args, message, meta={})[source]

Bases: CatalogException

The operation resulted in unrecoverable data loss or corruption.

exception tdw_catalog.errors.CatalogDeadlineExceededException(*args, message, meta={})[source]

Bases: CatalogException

Operation expired before completion. For operations that change the state of the system, this error may be returned even if the operation has completed successfully (timeout).

exception tdw_catalog.errors.CatalogException(*args, code=Errors.Unknown, message='', meta={})[source]

Bases: TwirpServerException

The most generic Catalog platform error

exception tdw_catalog.errors.CatalogFailedPreconditionException(*args, message, meta={})[source]

Bases: CatalogException

The operation was rejected because the system is not in a state required for the operation’s execution. For example, doing an rmdir operation on a directory that is non-empty, or on a non-directory object, or when having conflicting read-modify-write on the same resource.

exception tdw_catalog.errors.CatalogInternalException(*args, message, meta={})[source]

Bases: CatalogException

When some invariants expected by the underlying system have been broken. In other words, something bad happened in the library or backend service. Twirp specific issues like wire and serialization problems are also reported as “internal” errors.

exception tdw_catalog.errors.CatalogInvalidArgumentException(*args, message, meta={})[source]

Bases: CatalogException

The client specified an invalid argument. This indicates arguments that are invalid regardless of the state of the system (i.e. a malformed file name, required argument, number out of range, etc.).

exception tdw_catalog.errors.CatalogMalformedException(*args, message, meta={})[source]

Bases: CatalogException

The client sent a message which could not be decoded. This may mean that the message was encoded improperly or that the client and server have incompatible message definitions.

exception tdw_catalog.errors.CatalogNoErrorException(*args, message, meta={})[source]

Bases: CatalogException

exception tdw_catalog.errors.CatalogNotFoundException(*args, message, meta={})[source]

Bases: CatalogException

Some requested entity was not found.

exception tdw_catalog.errors.CatalogOutOfRangeException(*args, message, meta={})[source]

Bases: CatalogException

The operation was attempted past the valid range. For example, seeking or reading past end of a paginated collection. Unlike “invalid_argument”, this error indicates a problem that may be fixed if the system state changes (i.e. adding more items to the collection).

exception tdw_catalog.errors.CatalogPermissionDeniedException(*args, message, meta={})[source]

Bases: CatalogException

The caller does not have permission to execute the specified operation. It must not be used if the caller cannot be identified (use “unauthenticated” instead).

exception tdw_catalog.errors.CatalogResourceExhaustedException(*args, message, meta={})[source]

Bases: CatalogException

Some resource has been exhausted or rate-limited, perhaps a per-user quota, or perhaps the entire file system is out of space.

exception tdw_catalog.errors.CatalogUnauthenticatedException(*args, message, meta={})[source]

Bases: CatalogException

The request does not have valid authentication credentials for the operation.

exception tdw_catalog.errors.CatalogUnavailableException(*args, message, meta={})[source]

Bases: CatalogException

The service is currently unavailable. This is most likely a transient condition and may be corrected by retrying with a backoff.

exception tdw_catalog.errors.CatalogUnimplementedException(*args, message, meta={})[source]

Bases: CatalogException

The operation is not implemented or not supported/enabled in this service.

exception tdw_catalog.errors.CatalogUnknownException(*args, message, meta={})[source]

Bases: CatalogException

An unknown error occurred. For example, this can be used when handling errors raised by APIs that do not return any error information.

export

class tdw_catalog.export.CSVExport[source]

Bases: _Export

CSVExport represents a signed download URL pointing to the CSV-formatted result of a Dataset export_csv() operation, alongside metadata concerning the exported data.

This class is deliberately formatted for use with pandas’ read_csv function, as follows: e1 = await dataset.export_csv() and df = pd.read_csv(e1.url, **e1)

Attributes

query: str

The query statement which was used to create the Export

created_at: datetime

The time this Export was originally created

started_at: datetime

The time this Export was started

finished_at: datetime

The time this Export was completed

url: str

The CSV-formatted export results can be downloaded via this signed URL

dtypeDict[str, Type]

Metadata describing the schema of the exported data

parse_dates: List[str]

A list of columns within dtype that should be interpreted as dates

true_valuesList[str]

A list of values to interpret as “truthy”

false_valuesList[str]

A list of values to interpret as “falsey”

compressionOptional[str]

Indicates the compression format of the data, if any

async to_str() str[source]

Downloads the export into an in-memory str

Returns

str

The CSV contents of this export

async to_stream(out: BinaryIO)[source]

Downloads the export into an on-disk file, or other stream

Parameters

outio.BinaryIO

The stream to write CSV data to

class tdw_catalog.export.ParquetExport[source]

Bases: _Export

async to_bytes() BinaryIO[source]

Downloads the export into an in-memory buffer

Returns

BinaryIO

The Parquet contents of this export

async to_stream(out: BinaryIO)[source]

Downloads the export into an on-disk file, or other stream

Parameters

outio.BinaryIO

The stream to write Parquet data to

glossary_term

class tdw_catalog.glossary_term.GlossaryTerm(client, **kwargs)[source]

Bases: EntityBase, _OrganizationRelation

GlossaryTerms are used to categorize and classify columns within Datasets

Attributes

idstr

GlossaryTerm‘s unique id

organization_idstr

The unique ID of the Organization to which this GlossaryTerm belongs

user_idstr

The unique ID of the User who created this GlossaryTerm

titlestr

The title for this GlossaryTerm

description: Optional[str]

An Optional description for this GlossaryTerm

created_atdatetime

The datetime at which this GlossaryTerm was created

updated_atdatetime

The datetime at which this GlossaryTerm was last updated

delete() None[source]

Delete this GlossaryTerm. This GlossaryTerm object should not be used after delete() has successfully returned

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to delete this GlossaryTerm

CatalogNotFoundException

If the GlossaryTerm being deleted does not exist

CatalogException

If call to the Catalog server fails

save() None[source]

Update this GlossaryTerm, saving any changes to its title

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this GlossaryTerm, or if the given GlossaryTerm ID does not exist

CatalogException

If call to the Catalog server fails

list_datasets

class tdw_catalog.list_datasets.DatasetAlias(alias_key: str, alias_values: List[str])[source]

Bases: object

Used to sort the results of list_datasets by specific aliases

class tdw_catalog.list_datasets.Filter(limit: int = None, offset: int = None, keywords: List[str] | None = None, dataset_ids: List[str] | None = None, dataset_aliases: List[DatasetAlias] | None = None, reference_ids: List[str] | None = None, sources: List[Source] | None = None, topics: List[Topic] | None = None, creators: List[OrganizationMember] | None = None, states: List[ImportState] | None = None, warehouses: List[Warehouse] | None = None, timestamp_range: TimestampRange | None = None, sort: Sort | None = None)[source]

Bases: LegacyFilter

ListOrganizationDatasetsFilter filters the results from list_datasets on Organization.

Attributes

keywordsOptional[List[str]]

Filters results according to the specified keyword(s) (fuzzy matching is supported)

dataset_idsOptional[List[str]]

Filters results to the list of given Datasetid(s)

datset_aliasesOptional[List[DatasetAlias]]

Filters results to the list of given Datasetalias(es)

sourcesOptional[List[Source]]

Filters results to the list of given Source(s)

topicsOptional[List[Topic]]

Filters results to the list of given Topic(s)

creatorsOptional[List[OrganizationMember]]

Filters results to the list of given OrganizationMember(s), who created the returned Datasets

stateOptional[List[ImportState]]

Filters results to the list of given ImportStates. Note that virtualized datasets will always be categorized as IMPORTED.

warehouses: Optional[List[Warehouse]]

Filters results to the list of given Warehouses

timestamp_rangeOptional[TimestampRange]

Filters results to the within the given TimestampRange

sortOptional[Sort]

Sorts filtered results according to the provided Sort structure

class tdw_catalog.list_datasets.Sort(field: SortableField, order: FilterSortOrder | None = FilterSortOrder.ASC)[source]

Bases: object

Used to sort the results of list_datasets on Organization.

enum tdw_catalog.list_datasets.SortableField(value)[source]

Bases: StrEnum

The different fields which list_datasets on Organization can be sorted by

Member Type:

str

Valid values are as follows:

TITLE = <SortableField.TITLE: 'title'>
CREATED_AT = <SortableField.CREATED_AT: 'created_at'>
IMPORTED_AT = <SortableField.IMPORTED_AT: 'imported_at'>
UPDATED_AT = <SortableField.UPDATED_AT: 'updated_at'>
STATE = <SortableField.STATE: 'reference_state'>
NEXT_INGEST = <SortableField.NEXT_INGEST: 'reference_next_ingest'>
FAILED_AT = <SortableField.FAILED_AT: 'reference_failed_at'>
SOURCE_NAME = <SortableField.SOURCE_NAME: 'source_label'>
enum tdw_catalog.list_datasets.TimestampField(value)[source]

Bases: IntEnum

The different possible fields that can be used to construct a TimestampRange filter for list_datasets on Organization

Member Type:

int

Valid values are as follows:

CREATED_AT = <TimestampField.CREATED_AT: 0>
UPDATED_AT = <TimestampField.UPDATED_AT: 1>
IMPORTED_AT = <TimestampField.IMPORTED_AT: 2>
NEXT_INGEST = <TimestampField.NEXT_INGEST: 3>
FAILED_AT = <TimestampField.FAILED_AT: 5>
class tdw_catalog.list_datasets.TimestampRange(filter_by: TimestampField, start_time: datetime | None, end_time: datetime | None)[source]

Bases: object

Used to construct a temporal filter for list_datasets on Organization, where a filter specifies a TimestampField and a time range

organization

class tdw_catalog.organization.Organization(client, **kwargs)[source]

Bases: EntityBase

Organizations are the primary entrypoints to a Data Catalog, containing and linking together OrganizationMembers, Teams, Datasets, etc..

Attributes

titlestr

The name of the Organization

created_atdatetime

The datetime at which this Organization was created

updated_atdatetime

The datetime at which this Organization was last updated

create_credential() CredentialFactory[source]

Provides a CredentialFactory which is capable of creating Credentials within this Organization.

Parameters

Returns

CredentialFactory

A factory for creating specific types of Credentials

create_dataset(source: Source, title: str, description: str | None = None) Dataset[source]

Creates a new Dataset within this Organization. The Dataset will have a title and (optionally) a description, and must be associated with a Source. The Dataset will otherwise be empty and can be subsequently populated with metadata and data.

Parameters

source: source.Source

The Source to associated with the new Dataset

titlestr

A title for the new Dataset

description: Optional[str]

An optional description for the new Dataset (markdown supported)

Returns

Dataset

The newly created Dataset

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Datasets in this Organization

CatalogInvalidArgumentException

If title is an empty string, or if the Source belongs to a different Organization than this one

CatalogException

If call to the Catalog server fails

create_glossary_term(title: str, description: str | None = None) GlossaryTerm[source]

Create a GlossaryTerm within this Organization

Parameters

title: str

The name of the new GlossaryTerm

description: Optional[str]

The description of the new GlossaryTerm

Returns

GlossaryTerm

The newly created GlossaryTerm

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create GlossaryTerms in this Organization

CatalogInvalidArgumentException

If title is an empty string

CatalogAlreadyExistsException

If a GlossaryTerm with the provided title already exists in this Organization

CatalogException

If call to the Catalog server fails

create_lineage(upstream_dataset: Dataset, downstream_dataset: Dataset, label: str, description: str | None = None, column_lineage: List[tuple[Union[str, List[str]], Union[str, List[str]]]] = []) DatasetLineageRelationship[source]

Create a DatasetLineageRelationship within this Organization. Each relationship describes a single source and destination Dataset (a single “edge” in the lineage graph), with optional column-level lineage.

Branching (or many-to-many) relationships can be modelled by decomposing them into their individual edges.

Parameters

upstream_datasetDataset

The source dataset involved in this DatasetLineageRelationship

downstream_datasetDataset

The destination dataset involved in this DatasetLineageRelationship

labelstr

A label describing this DatasetLineageRelationship

descriptionOptional[str]

An optional description providing further details about this DatasetLineageRelationship

column_lineageList[tuple[Union[str, List[str]],Union[str,List[str]]]]

An optional list of column-level associations between the two Datasets, specified as tuples. Each tuple is a single column-level relationship between a list of upstream columns and a list of downstream columns. This argument defaults to the empty List if not supplied. Example: [("address", ["street_number","street_name","city"])]

Returns

DatasetLineageRelationship

The newly created DatasetLineageRelationship

Raises

CatalogInvalidArgumentException

If any specified column names within provided column lineage do not actually exist in the provided Datasets

CatalogPermissionDeniedException

If the caller is not allowed to define lineage in this Organization, or if they do not have access to one of the involved Datasets

CatalogException

If call to the Catalog server fails

create_metadata_template(title: str, description: str | None = None) MetadataTemplateCreationBuilder[source]

Provides a MetadataTemplateCreationBuilder which is capable of creating MetadataTemplates within this Organization.

Parameters

titlestr

The title for the MetadataTemplate

descriptionOptional[str]

An optional description for the MetadataTemplate

Returns

MetadataTemplateCreationBuilder

A factory for creating new MetadataTemplates

create_or_replace_lineage(upstream_dataset: Dataset, downstream_dataset: Dataset, label: str, description: str | None = None, column_lineage: List[tuple[List[str], List[str]]] = []) DatasetLineageRelationship[source]

Create a DatasetLineageRelationship within this Organization. Each relationship describes a single source and destination Dataset (a single “edge” in the lineage graph), with optional column-level lineage.

Branching (or many-to-many) relationships can be modelled by decomposing them into their individual edges.

If no relationships between the given Datasets exist, one will be created. Unlike create_lineage, pre-existing relationships between the given Datasets will be cleared and replaced by this one, facilitating easy one-way syncs from an external lineage metdata source and the Catalog platform.

Parameters

upstream_datasetDataset

The source dataset involved in this DatasetLineageRelationship

downstream_datasetDataset

The destination dataset involved in this DatasetLineageRelationship

labelstr

A label describing this DatasetLineageRelationship

descriptionOptional[str]

An optional description providing further details about this DatasetLineageRelationship

column_lineageList[tuple[Union[str, List[str]],Union[str,List[str]]]]

An optional list of column-level associations between the two Datasets, specified as tuples. Each tuple is a single column-level relationship between a list of upstream columns and a list of downstream columns. This argument defaults to the empty List if not supplied. Example: [("address", ["street_number","street_name","city"])]

Returns

DatasetLineageRelationship

The newly created DatasetLineageRelationship

Raises

CatalogInvalidArgumentException

If any specified column names within provided column lineage do not actually exist in the provided Datasets

CatalogPermissionDeniedException

If the caller is not allowed to define lineage in this Organization, or if they do not have access to one of the involved Datasets

CatalogException

If call to the Catalog server fails

create_source(label: str, description: str | None = None) Source[source]

Create a Source within this Organization

Parameters

labelstr

A descriptive label for the Source

descriptionOptional[str] = None

The description of the Source

Returns

Source:

The newly created Source

Raises

CatalogInternalException

If call to the Catalog server fails

create_team(title: str) Team[source]

Create a Team within this Organization

Parameters

title: str

The name of the new Team

Returns

Team

The newly created Team

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Team s in this Organization

CatalogException

If call to the Catalog server fails

create_topic(title: str) Topic[source]

Create a Topic within this Organization

Parameters

title: str

The name of the new Topic

Returns

Topic

The newly created Topic

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create Topics in this Organization

CatalogException

If call to the Catalog server fails

delete() None[source]

Delete this Organization. This Organization object should not be used after delete() has successfully returned, as the Catalog organization it represents will no longer exist.

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to delete this Organization

CatalogException

If call to the Catalog server fails

get_connection(id: str) IngestionConnection | VirtualizationConnection[source]

Retrieve the given IngestionConnection or VirtualizationConnection from this Organization

Parameters

team_idstr

The unique ID of the Connection

Returns

Union[IngestionConnection,VirtualizationConnection]

The Connection with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve Connections from this Organization

CatalogNotFoundException

If the given Connection ID does not exist

CatalogException

If call to the Catalog server fails

get_credential(credential_id: str) Credential[source]

Retrieve a Credential belonging to this Organization

Parameters

credential_idstr

The unique ID of the Credential

Returns

Credential

The Credential associated with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve Credentials

CatalogNotFoundException

If the given Credential ID does not exist

CatalogException

If call to the Catalog server fails

get_dataset(id: str) Dataset | ConnectedDataset[source]

Retrieve the given Dataset from this Organization

Parameters

idstr

The unique ID of the Dataset

Returns

Dataset

The Dataset with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve the given Dataset, or if the Dataset does not exist

CatalogInvalidArgumentException

If the given Dataset ID is not a valid v4 UUID

CatalogException

If call to the Catalog server fails

get_glossary_term(id: str) GlossaryTerm[source]

Retrieve the given GlossaryTerm from this Organization

Parameters

idstr

The unique ID of the GlossaryTerm

Returns

GlossaryTerm

The GlossaryTerm with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve GlossaryTerms from this Organization, or if the given GlossaryTerm ID does not exist

CatalogException

If call to the Catalog server fails

get_lineage(id: str) DatasetLineageRelationship[source]

Retrieve the given DatasetLineageRelationship from this Organization

Parameters

idstr

The unique ID of the DatasetLineageRelationship

Returns

DatasetLineageRelationship

The DatasetLineageRelationship with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve DatasetLineageRelationships from this Organization

CatalogInvalidArgumentException

If the given DatasetLineageRelationship ID does not exist

CatalogException

If call to the Catalog server fails

get_member(user_id: str) OrganizationMember[source]

Retrieve the a specific member (User) of this Organization

Parameters

user_idstr

The unique User ID of the OrganizationMember

Returns

OrganizationMember

The OrganizationMember with the given User ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to fetch OrganizationMembers

CatalogInvalidArgumentException

If the given User ID does not exist or is not a member of this Organization

CatalogException

If call to the Catalog server fails

get_metadata_template(id: str) MetadataTemplate[source]

Retrieve a MetadataTemplate belonging to this Organization

Parameters

idstr

The unique ID of the MetadataTemplate

Returns

MetadataTemplate

The MetadataTemplate associated with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve MetadataTemplates

CatalogNotFoundException

If the given MetadataTemplate ID does not exist

CatalogException

If call to the Catalog server fails

get_source(id: str) Source[source]

Retrieve a Source belonging to this Organization

Parameters

idstr

The unique ID of the Source

Returns

Source

The Source associated with the given ID

Raises

CatalogInternalException

If call to the Catalog server fails

CatalogNotFoundException

If the Source with the supplied ID could not be found

CatalogPermissionDeniedException

If the caller is not allowed to retrieve the given Source

get_team(team_id: str) Team[source]

Retrieve the given Team from this Organization

Parameters

team_idstr

The unique ID of the Team

Returns

Team

The Team with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve Team s from this Organization

CatalogInvalidArgumentException

If the given Team ID does not exist

CatalogException

If call to the Catalog server fails

get_topic(id: str) Topic[source]

Retrieve the given Topic from this Organization

Parameters

idstr

The unique ID of the Topic

Returns

Topic

The Topic with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve Topics from this Organization, or if the given Topic ID does not exist

CatalogException

If call to the Catalog server fails

invite_member(user_id: str, roles: OrganizationMemberRoles = None) OrganizationMember[source]

Invite the given User to be an OrganizationMember of this Organization

Parameters

user_idstr

The unique User ID of the invitee

rolesorganization_member.OrganizationMemberRoles

The membership roles for the User. All roles default to false.

Returns

OrganizationMember

The newly created OrganizationMember

Raises

CatalogPermissionDeniedException

If the caller is not allowed to invite OrganizationMembers

CatalogAlreadyExistsException

If the caller is inviting a User who is already an OrganizationMember of this Organization

CatalogInvalidArgumentException

If the given User ID does not exist

CatalogException

If call to the Catalog server fails

invite_members(emails: List[str], invite_message: str | None = '', raise_on_failure: bool | None = False, roles: OrganizationMemberRoles | None = None) InviteMembersResponse[source]

Invite the given User(s) to become OrganizationMembers of this Organization. If a given email does not correspond to an existing User, an invitation to the Catalog platform will be sent via email.

Parameters

emails

The list of email addresses of the invitees.

invite_messageOptional[str]

The message to send the users when sending the invitation

raise_on_failureOptional[bool]

Whether to raise an exception on a failure of any one invite

rolesOptional[OrganizationMemberRoles]

The roles the new members will take when invited

Returns

InviteMembersResponse

This contains a summary of the successful and failed invitations.

Raises

CatalogPermissionDeniedException

If the caller is not allowed to invite OrganizationMember s

CatalogException

If call to the Catalog server fails

list_connections(filter: ListConnectionsFilter | None = None) List[IngestionConnection | VirtualizationConnection][source]

List all VirtualizationConnection and IngestionConnections in this Organization

Parameters

filterOptional[Filter]

An optional Filter on the returned Connection list, useful for pagination of results. Note that the organization_id property will be set automatically to this Organization.

Returns

List[Union[IngestionConnection,VirtualizationConnection]]

The list of Connections in this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list Connections in this Organization

CatalogException

If call to the Catalog server fails

list_credentials(filter: LegacyFilter = None) List[Credential][source]

List Credentials which belong to the given Organization

Returns

List[Credential]

Credentials created under this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list Credentials

CatalogException

If call to the Catalog server fails

list_datasets(filter: Filter | None = None) Iterator[Dataset][source]

Retrieve the list of Datasets which belong to the Organization. The maximum number of results is limited, and must be paginated via the filter to obtain additional results.

Parameters

filterOptional[list_datasets.Filter]

An optional filter on the returned Datasets (None by default)

Returns

Iterator[Dataset]

An Iterator of Datasets belonging to this Organization, which are lazily fetched as the Iterator is iterated.

Raises

CatalogPermissionDeniedException

If the caller does not have permission to list Datasets

CatalogException

If there is an issue communicating with the Catalog server, or an issue with the server itself

list_external_warehouses() List[ExternalWarehouse][source]

Retrieve the list of known ExternalWarehouses available to this Organization

Parameters

None

Returns

List[ExternalWarehouse]

ExternalWarehouses that are available to this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list ExternalWarehouses (the caller must be an Organization admin, or have Dataset creation privileges)

CatalogException

If call to the Catalog server fails

list_glossary_terms(organization_ids: List[str] | None = None, filter: ListGlossaryTermsFilter | None = None) List[GlossaryTerm][source]

List all GlossaryTerms in this Organization

Parameters

organization_ids: Optional[List[str]]

An optional list of Organization ID’s to list GlossaryTerms from multiple Organizations

filterOptional[ListGlossaryTermsFilter]

An optional ListGlossaryTermsFilter on the returned GlossaryTerm list, useful for pagination of results

Returns

List[GlossaryTerm]

The list of GlossaryTerms in this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list GlossaryTerms in this Organization

CatalogException

If call to the Catalog server fails

list_members(filter: LegacyFilter | None = None) List[OrganizationMember][source]

Retrieve all OrganizationMembers of this Organization

Parameters

None

Returns

List[OrganizationMember]

The OrganizationMembers which are a member of this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list OrganizationMembers

CatalogException

If call to the Catalog server fails

list_metadata_templates() List[MetadataTemplate][source]

List all MetadataTemplates which belong to the given Organization

Returns

List[MetadataTemplate]

MetadataTemplates created under this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list MetadataTemplates

CatalogException

If call to the Catalog server fails

list_sources(filter: ListSourcesFilter | None = None) List[Source][source]

List Sources which belong to the given Organization

Parameters

filter:SourcesFilter

The SourceFilter to be used when performing the search

Returns

List[Source]

Sources created under this Organization

Raises

CatalogException

If call to the Catalog server fails

list_target_warehouses() List[TargetWarehouse][source]

Retrieve the list of known TargetWarehouses available to this Organization

Parameters

None

Returns

List[TargetWarehouse]

TargetWarehouses that are available to this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list TargetWarehouses (the caller must be an Organization admin, or have Dataset creation privileges)

CatalogException

If call to the Catalog server fails

list_teams(organization_ids=None, filter: LegacyFilter | None = None) List[Team][source]

List all Teams in this Organization

Parameters

filterOptional[LegacyFilter]

An optional filter on the returned Team list, useful for pagination of results

Returns

List[Team]

The list of Team s in this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list Team s in this Organization

CatalogException

If call to the Catalog server fails

list_topics(filter: LegacyFilter = None) List[Topic][source]

List all Topics in this Organization

Parameters

filterOptional[LegacyFilter]

An optional filter on the returned Topic list, useful for pagination of results

Returns

List[Topic]

The list of Topics in this Organization

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list Topics in this Organization

CatalogException

If call to the Catalog server fails

save() None[source]

Update this Organization, saving any changes to its title

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this Organization

CatalogException

If call to the Catalog server fails

organization_member

class tdw_catalog.organization_member.OrganizationMember(client, **kwargs)[source]

Bases: User, _OrganizationRelation

An OrganizationMember reflects a relationship between User and Organization, where the User has been invited to the Organization and been granted specific privileges within the Organization.

Attributes

user_idstr

The unique user ID of the OrganizationMember

organizationorganization.Organization

The Organization object that relates to the organization_id of this model

organization_idstr

The unique ID of the Organization to which this OrganizationMember belongs

roles: OrganizationMemberRoles

The roles this Member has within their Organization

created_atdatetime

The datetime at which this OrganizationMember was added to the Organization

updated_atdatetime

The datetime at which this OrganizationMember was last updated

delete() None[source]

Remove this OrganizationMember from the Organization. This OrganizationMember object should not be used after delete() returns successfully.

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to delete this OrganizationMember, or if the caller is attempting to delete themselves

CatalogException

If call to the Catalog server fails

classmethod get(client: Catalog, organization_id: str, id: str)[source]

Retrieve an OrganizationMember belonging to this Organization

Parameters

clientCatalog

The Catalog client of the Organization containing the OrganizationMember

organization_idstr

The unique ID of the Organization

idstr

The unique ID of the OrganizationMember

Returns

OrganizationMember

The OrganizationMember associated with the given ID

Raises

CatalogInternalException

If call to the Catalog server fails

CatalogNotFoundException

If no OrganizationMember is found matching the provided ID

CatalogPermissionDeniedException

If the caller is not allowed to retrieve this OrganizationMember

get_teams(filter: LegacyFilter | None = None) List[Team][source]

Retrieve the Teams to which this OrganizationMember belongs

Parameters

filterOptional[LegacyFilter]

An optional filter on the returned Team list, useful for pagination of results

Returns

List[Team]

The list of Teams to which this OrganizationMember belongs

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve Teams from this Organization

CatalogException

If call to the Catalog server fails

save() None[source]

Update this OrganizationMember, saving any changes to its roles

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this OrganizationMember

CatalogAlreadyExistsException

If the caller is attempting to invite a member which is already a member of the Organization

CatalogException

If call to the Catalog server fails

class tdw_catalog.organization_member.OrganizationMemberRoles(role_data_uploader: bool = False, role_data_viewer: bool = False, role_data_editor: bool = False, role_member_manager: bool = False, role_organization_manager: bool = False, role_admin: bool = False, role_topic_manager: bool = False, role_field_template_manager: bool = False)[source]

Bases: object

OrganizationMemberRoles defines the roles which an OrganizationMember has within an Organization

Attributes

role_data_uploaderbool

Whether this OrganizationMember is allowed to upload to the Organization

role_data_viewerbool

Whether this OrganizationMember is allowed to view Datasets within the Organization

role_data_editorbool

Whether this OrganizationMember is allowed to modify Datasets within the Organization

role_member_managerbool

Whether this OrganizationMember is allowed to manage members within the Organization

role_organization_managerbool

Whether this OrganizationMember is allowed to manage the Organization

role_adminbool

Whether this OrganizationMember is an Organization administrator

role_topic_managerbool

Whether this OrganizationMember is allowed to manage Topics within the Organization

role_field_template_managerbool

Whether this OrganizationMember is allowed to manage MetadataTemplates

organization_utils

class tdw_catalog.organization_utils.InviteMembersResponse(failed_invitations: List[InviteMembersResponseFailedInvitation], successful_invitations: List[OrganizationMember])[source]

Bases: object

InviteMembersResponse contains the successfully invited members and summarizes any failed invitations.

Attributes

failed_invitations: List[InviteMembersResponseFailedInvitee]

List of email addresses and error message summaries of the failed invitations.

successful_invitationList[organization_member.OrganizationMember]

List of members which were successfully invited to the Organization.

class tdw_catalog.organization_utils.InviteMembersResponseFailedInvitation(email: str, error_message: str)[source]

Bases: object

InviteMembersResponseFailedInvitation is a container for a single failed invitation, providing information about why that invitation failed to send.

Attributes

email: str

The email address of the invitee

error_messagestr

A message indicating why the invitation failed to send.

query

class tdw_catalog.query.QueryCursor(res: Dict[str, any])[source]

Bases: object

QueryCursor is a Python DB API-style Cursor object (PEP 249) for query results from the Catalog.

Attributes

arraysize: number

Read/write attribute that controls the number of rows returned by fetchmany(). The default value is 1 which means a single row would be fetched per call.

description: List[tuple]

Read-only attribute that provides the column names of the last query. To remain compatible with the Python DB API, it returns a 7-tuple for each column where the last five items of each tuple are None.

close()[source]

A no-op which is included to increase API-compability with Python DB API

fetchall() List[tuple][source]

Return all (remaining) rows of a query result as a list. Return an empty list if no rows are available.

fetchmany(size=None) List[tuple][source]

Return the next set of rows of a query result as a list. Return an empty list if no more rows are available.

The number of rows to fetch per call is specified by the size parameter. If size is not given, arraysize determines the number of rows to be fetched. If fewer than size rows are available, as many rows as are available are returned.

Note there are performance considerations involved with the size parameter. For optimal performance, it is usually best to use the arraysize attribute. If the size parameter is used, then it is best for it to retain the same value from one fetchmany() call to the next.

fetchone() tuple | None[source]

Returns the next row query result set as a tuple. Return None if no more data is available.

source

class tdw_catalog.source.Source(client, **kwargs)[source]

Bases: EntityBase, _OrganizationRelation

A Source is used to semantically group a set of related Datasets. Users are free to label a Source in a descriptive way to best understand the meaning behind this grouping.

Attributes

idstr

Source’s unique id

organizationOrganization

The Organization`associated with this :class:.Source`. An Organization or organization_id can be provided but not both.

organization_idstr

The unique ID of the Organization to which this Source belongs

user_idstr

The unique user ID of the OrganizationMember who created this Source

labelstr

A descriptive label for this Source

descriptionOptional[str] = None

An optional extended description for this Source

created_atdatetime

The datetime at which this Source was created

updated_atdatetime

The datetime at which this Source was last updated

create_ingestion_connection(label: str, portal: ConnectionPortalType, url: str | None = None, description: str | None = None, warehouse: Warehouse | None = None, credential: Credential | None = None, ingest_schedules: List[ConnectionSchedule] | None = None) IngestionConnection[source]

Create an IngestionConnection within this Source

Parameters

labelstr

The descriptive label for this IngestionConnection

portalConnectionPortalType

The method of data access employed by this IngestionConnection

urlOptional[str]

A canonical URL that points to the location of data resources within the portal

descriptionOptional[str] = None

An optional extended description for this IngestionConnection

warehouseOptional[Warehouse]

Datasets created using this IngestionConnection will ingest to this Warehouse by default (can be overriden at ingest time).

credentialOptional[Credential]

The Credential associated with this IngestionConnection.

ingest_schedulesOptional[List[ConnectionSchedule]]

Optional ConnectionSchedules which, when specified, indicate the frequency with which to reingest ingested data. Specific Datasets using this IngestionConnection may override this set of Schedules.

Returns

IngestionConnection

The newly created IngestionConnection

Raises

CatalogPermissionDeniedException

If the caller is not allowed to create IngestionConnections in this Organization

CatalogException

If call to the Catalog server fails

delete() None[source]

Delete this Source. This Source object should not be used after delete() has successfully returned

Raises

CatalogPermissionDeniedException

If the caller is not allowed to delete this Source

CatalogException

If call to the Catalog server fails

classmethod get(client, organization_id: str, id: str)[source]

Retrieve a Source belonging to this Organization

Parameters

clientCatalog

The Catalog client to use to get the Source

organization_idstr

The Organization`ID the :class:.Source` belongs to

idstr

The unique ID of the Source

Returns

Source

The Source associated with the given ID

Raises

CatalogInternalException

If call to the Catalog server fails

CatalogNotFoundException

If the Source with the supplied ID could not be found

CatalogPermissionDeniedException

If the caller is not allowed to retrieve this Source

list_connections(filter: ListConnectionsFilter | None = None) List[IngestionConnection | VirtualizationConnection][source]

List all IngestionConnection and VirtualizationConnections belonging to this Source

Parameters

filterOptional[ListConnectionsFilter]

An optional filter on the returned Connection list, useful for pagination of results. Note that the organization_id and source_ids properties will be set automatically to this Organization and Source.

Returns

List[Connection]

The list of Connections in this Source

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list Connections in this Organization

CatalogException

If call to the Catalog server fails

save() None[source]

Update this Source, saving any changes to its fields

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this Source

CatalogException

If call to the Catalog server fails

team

class tdw_catalog.team.Team(client, **kwargs)[source]

Bases: EntityBase, _OrganizationRelation

Teams are sets of OrganizationMembers, with which Datasets can be shared.

Attributes

idstr

The unique ID of this Team

organizationorganization.Organization

The Organization`that relates to the `organization_id on the model

organization_idstr

The unique ID of the Organization to which this Team belongs

titlestr

The name of this Team

created_atdatetime

The datetime at which this Team was created

updated_atdatetime

The datetime at which this Team was last updated

add_member(user_id: str, permission: TeamMemberPermissionLevel) TeamMember[source]

Add a User to the Team as a TeamMember. The User in question must already be a member of the containing Organization.

Parameters

user_idstr

The unique User ID of the invitee

Returns

TeamMember

The newly created TeamMember

Raises

CatalogPermissionDeniedException

If the caller is not allowed to invite Team members

CatalogAlreadyExistsException

If the caller is inviting a User who is already a TeamMember of this Team

CatalogInvalidArgumentException

If the given User ID does not exist

CatalogException

If call to the Catalog server fails

delete() None[source]

Delete this Team. This Team object should not be used after delete() has successfully returned

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to delete this Team

CatalogException

If call to the Catalog server fails

classmethod get(client: Catalog, organization_id: str, id: str)[source]

Retrieve an Team belonging to this Organization

Parameters

clientCatalog

A Catalog client

organization_idstr

The unique ID of the Organization

idstr

The unique ID of the Team

Returns

Team

The Team associated with the given ID

Raises

CatalogInternalException

If call to the Catalog server fails

CatalogNotFoundException

If no Team is found matching the provided ID

CatalogPermissionDeniedException

If the caller is not allowed to retrieve this Team

get_member(user_id: str) TeamMember[source]

Retrieve a specific member (User) of this Team

Parameters

user_idstr

The unique User ID of the TeamMember

Returns

TeamMember

The TeamMember with the given User ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to fetch Team members

CatalogNotFoundException

If the given User ID does not exist or is not a member of this Team

CatalogInvalidArgumentException

If the given User ID is malformed

CatalogException

If call to the Catalog server fails

list_members(filter: LegacyFilter | None = None) List[TeamMember][source]

Retrieve all TeamMembers of this Team

Parameters

None

Returns

list[TeamMembers]

The TeamMembers which are a member of this Team

Raises

CatalogPermissionDeniedException

If the caller is not allowed to list TeamMembers

CatalogException

If call to the Catalog server fails

save() None[source]

Update this Team, saving any changes to its title

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this Team

CatalogException

If call to the Catalog server fails

enum tdw_catalog.team.TeamMemberPermissionLevel(value)[source]

Bases: IntEnum

Member Type:

int

Valid values are as follows:

VIEW = <TeamMemberPermissionLevel.VIEW: 1>
MANAGE = <TeamMemberPermissionLevel.MANAGE: 2>

team_member

class tdw_catalog.team_member.TeamMember(client, **kwargs)[source]

Bases: User

A TeamMember reflects a relationship between User and Team, where the User has been invited to the Team and been granted specific privileges within the Team.

Attributes

teamteam.Team

The Team that relates to the team_id of the model

team_idstr

The unique ID of the Team to which this TeamMember belongs

permission: TeamMemberPermissionLevel

The permission level the User has within the Team

created_atdatetime

The timestamp this TeamMember was added to the Team

updated_atdatetime

The timestamp this TeamMember permission was changed

delete() None[source]

Remove this TeamMember from the Team. This TeamMember object should not be used after delete() returns successfully.

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to delete this TeamMember, or if the caller is attempting to delete themselves

CatalogException

If call to the Catalog server fails

classmethod get(client: Catalog, team_id: str, id: str)[source]

Retrieve an TeamMember

Parameters

clientCatalog

The Catalog client

team_idstr

The unique ID of the Team

idstr

The unique ID of the TeamMember

Returns

TeamMember

The TeamMember associated with the given ID

Raises

CatalogInternalException

If call to the Catalog server fails

CatalogNotFoundException

If no TeamMember is found matching the provided ID

CatalogPermissionDeniedException

If the caller is not allowed to retrieve this TeamMember

save() None[source]

Update this TeamMember, saving any changes to its permission level

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this TeamMember

CatalogInvalidArgumentException

If the caller supplies an invalid permission level before saving this TeamMember

CatalogException

If call to the Catalog server fails

topic

class tdw_catalog.topic.Topic(client, **kwargs)[source]

Bases: EntityBase, _OrganizationRelation

Topics are used to classify Datasets within an Organization. Classification can be used as a means to apply a grouping label to one or more Datasets.

Attributes

idstr

Topic’s unique id

organization_idstr

The unique ID of the Organization to which this Topic belongs

created_bystr

The unique user ID of the user who created this Topic

titlestr

The title for this Topic

created_atdatetime

The datetime at which this Topic was created

updated_atdatetime

The datetime at which this Topic was last updated

delete() None[source]

Delete this Topic. This Topic object should not be used after delete() has successfully returned

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to delete this Topic, or if the given Topic ID does not exist

CatalogException

If call to the Catalog server fails

classmethod get(client: Catalog, organization_id: str, id: str)[source]

Retrieve a Topic

Parameters

clientCatalog

The Catalog client to use to get the Topic

organization_idstr

The ID of the Organization in which this Topic exists

idstr

The unique ID of the Topic

Returns

Topic

The Topic associated with the given ID

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve the given Topic

CatalogNotFoundException

If the given Topic ID does not exist

CatalogException

If call to the Catalog server fails

save() None[source]

Update this Topic, saving any changes to its title

Parameters

None

Returns

None

Raises

CatalogPermissionDeniedException

If the caller is not allowed to update this Topic, or if the given Topic ID does not exist

CatalogException

If call to the Catalog server fails

user

class tdw_catalog.user.User(client, **kwargs)[source]

Bases: EntityBase

A User is a registered user of the ThinkData Catalog. Currently Users can only be created through the Catalog user interface, and cannot be created through the API. Users can be added as members of Organizations and Teams, and have Datasets shared with them.

Attributes

user_id: str

The unique ID of a User on the Catalog server

email: str

The User’s registered email address

name: str

The User’s full name

warehouse

class tdw_catalog.warehouse.ExternalWarehouse(client, **kwargs)[source]

Bases: Warehouse

An ExternalWarehouse is a Warehouse which is configured for Data Virtualization. New data cannot be written to an ExternalWarehouse, but virtualized Datasets can be created which read from it.

Attributes

database_name: Optional[str]

If set, the database to virtualize tables and views from

schema: Optional[str]

If set, the schema to virtualize tables and views from

class tdw_catalog.warehouse.TargetWarehouse(client, **kwargs)[source]

Bases: Warehouse

A TargetWarehouse is a Warehouse which is configured for data ingestion. New data can be written to a TargetWarehouse.

class tdw_catalog.warehouse.Warehouse(client, **kwargs)[source]

Bases: EntityBase

A Warehouse is a place where Datasets are stored. Currently, Warehouses are configured at the deployment-level and cannot be modified through this SDK.

Attributes

name: str

The unique name of the warehouse in the system. This name will never change for the life of the Warehouse.

display_name: str

The descriptive name of the Warehouse.

warehouse_type: str

The type of Warehouse this represents.

external: Optional[bool]

True if this Warehouse is virtualized within the Catalog

classmethod get(client: Catalog, name: str, organization_id: str)[source]

Retrieve a Warehouse by name

Parameters

clientCatalog

The Catalog client to use to get the Warehouse

namestr

The unique name of the Warehouse

organization_idstr

The organization in which the Warehouse is configured

Returns

Warehouse

The Warehouse associated with the given name

Raises

CatalogPermissionDeniedException

If the caller is not allowed to retrieve the given Warehouse

CatalogNotFoundException

If the given Warehouse does not exist

CatalogException

If call to the Catalog server fails

utils

enum tdw_catalog.utils.ColumnType(value)[source]

Bases: StrEnum

The different possible data types for Columns within a DataDictionary

Member Type:

str

Valid values are as follows:

BOOLEAN = <ColumnType.BOOLEAN: 'boolean'>
DATE = <ColumnType.DATE: 'date'>
DATETIME = <ColumnType.DATETIME: 'datetime'>
INTEGER = <ColumnType.INTEGER: 'integer'>
DECIMAL = <ColumnType.DECIMAL: 'decimal'>
PERCENT = <ColumnType.PERCENT: 'percent'>
CURRENCY = <ColumnType.CURRENCY: 'currency'>
STRING = <ColumnType.STRING: 'string'>
TEXT = <ColumnType.TEXT: 'text'>
GEOMETRY = <ColumnType.GEOMETRY: 'geometry'>
GEOJSON = <ColumnType.GEOJSON: 'geojson'>
enum tdw_catalog.utils.ConnectionPortalType(value)[source]

Bases: StrEnum

Member Type:

str

Valid values are as follows:

GS = <ConnectionPortalType.GS: 'Gs'>
S3 = <ConnectionPortalType.S3: 'S3'>
UNITY = <ConnectionPortalType.UNITY: 'Unity'>
FTP = <ConnectionPortalType.FTP: 'Ftp'>
SFTP = <ConnectionPortalType.SFTP: 'Sftp'>
EXTERNAL = <ConnectionPortalType.EXTERNAL: 'External'>
NULL = <ConnectionPortalType.NULL: 'Null'>
IMPORT_LITE = <ConnectionPortalType.IMPORT_LITE: 'ImportLite'>
HTTP = <ConnectionPortalType.HTTP: 'Http'>
CATALOG = <ConnectionPortalType.CATALOG: 'Namara'>
class tdw_catalog.utils.CurrencyFieldValue(value: float, currency: str)[source]

Bases: object

CurrencyFieldValue models the value of a currency field

Attributes

valuefloat

The currency value

currencystr

The specific currency to which the value belongs

class tdw_catalog.utils.Filter(limit: int = None, offset: int = None)[source]

Bases: LegacyFilter

Filter describes the ways in which results should be filtered and/or paginated. It is serialized in a new way vs LegacyFilter

Attributes

limitint, optional

Limits the number of results. Useful for pagination. (None by default)

offsetint, optional

Offsets the result list by the given number of results. Useful for pagination. (None by default)

class tdw_catalog.utils.FilterSort(field: str, order: FilterSortOrder = FilterSortOrder.ASC)[source]

Bases: object

FilterSort describes a desired sort field and order for results.

Attributes

fieldstr

The field to sort by

orderFilterSortOrder, optional

The order to sort in (FilterSortOrder.ASC by default)

enum tdw_catalog.utils.FilterSortOrder(value)[source]

Bases: Enum

Valid values are as follows:

ASC = <FilterSortOrder.ASC: 1>
DESC = <FilterSortOrder.DESC: 2>
enum tdw_catalog.utils.ImportState(value)[source]

Bases: StrEnum

The different possible states an imported dataset might occupy. Virtualized datasets will always show state IMPORTED.

Member Type:

str

Valid values are as follows:

IMPORTED = <ImportState.IMPORTED: 'imported'>
IMPORTING = <ImportState.IMPORTING: 'importing'>
QUEUED = <ImportState.QUEUED: 'queued'>
FAILED = <ImportState.FAILED: 'failed'>
class tdw_catalog.utils.LegacyFilter(limit: int = None, offset: int = None)[source]

Bases: object

LegacyFilter describes the ways in which results should be filtered and/or paginated

Attributes

limitint, optional

Limits the number of results. Useful for pagination. (None by default)

offsetint, optional

Offsets the result list by the given number of results. Useful for pagination. (None by default)

class tdw_catalog.utils.ListConnectionsFilter(limit: int = None, offset: int = None, organization_id: str | None = None, source_ids: List[str] | None = None, portals: List[ConnectionPortalType] | None = None)[source]

Bases: LegacyFilter

ListConnectionsFilter filters results according to Connection fields

Attributes

organization_idOptional[str]

Filters results by organization_id

source_idsOptional[List[str]]

Filters results to the given source_id(s)

portalsOptional[List[ConnectionPortalType]]

Filters results to the given ConnectionPortalType(s)

class tdw_catalog.utils.ListGlossaryTermsFilter(limit: int = None, offset: int = None, glossary_term_ids: List[str] | None = None)[source]

Bases: Filter

ListGlossaryTermsFilter filters results according to GlossaryTerm ids

Attributes

glossary_term_idsOptional[List[str]]

Filters results to the given glossary_term_id(s)

class tdw_catalog.utils.ListOrganizationsFilter(limit: int = None, offset: int = None, organization_ids: List[str] | None = None)[source]

Bases: LegacyFilter

ListOrganizationsFilter filters Organization results according to a set of provided ids

Attributes

organization_idsstr[], optional

Filters results according to a set of provided ids

class tdw_catalog.utils.ListSourcesFilter(limit: int = None, offset: int = None, labels: str | None = None)[source]

Bases: LegacyFilter

ListSourcesFilter filters results according to Source fields

Attributes

labelsOptional[str]

Filters results by label. This will match label substrings.

enum tdw_catalog.utils.MetadataFieldType(value)[source]

Bases: IntEnum

The different possible data types for values stored in MetadataFields and default values stored in MetadataTemplateFields

Member Type:

int

Valid values are as follows:

FT_STRING = <MetadataFieldType.FT_STRING: 0>
FT_INTEGER = <MetadataFieldType.FT_INTEGER: 1>
FT_DECIMAL = <MetadataFieldType.FT_DECIMAL: 2>
FT_DATE = <MetadataFieldType.FT_DATE: 3>
FT_DATETIME = <MetadataFieldType.FT_DATETIME: 4>
FT_DATASET = <MetadataFieldType.FT_DATASET: 5>
FT_URL = <MetadataFieldType.FT_URL: 6>
FT_USER = <MetadataFieldType.FT_USER: 7>
FT_ATTACHMENT = <MetadataFieldType.FT_ATTACHMENT: 8>
FT_LIST = <MetadataFieldType.FT_LIST: 9>
FT_CURRENCY = <MetadataFieldType.FT_CURRENCY: 10>
FT_TEAM = <MetadataFieldType.FT_TEAM: 11>
FT_ALIAS = <MetadataFieldType.FT_ALIAS: 12>
class tdw_catalog.utils.QueryFilter(limit: int = None, offset: int = None, sort: FilterSort = None, query: str | None = None)[source]

Bases: SortableFilter

QueryFilter filters results according to a NiQL query

Attributes

querystr, optional

Filters results according to a NiQL query

class tdw_catalog.utils.SortableFilter(limit: int = None, offset: int = None, sort: FilterSort = None)[source]

Bases: LegacyFilter

SortableFilter describes the ways in which results should be filtered, paginated and/or sorted.

Attributes

limitint, optional

Limits the number of results. Useful for pagination. (None by default)

offsetint, optional

Offsets the result list by the given number of results. Useful for pagination. (None by default)

sortFilterSort, optional

Specifies a desired sort field and order for results (None by default).