When BigQuery is used for replication, additional staging information is required, as it is supported only as a target data connection. Additionally, you cannot edit a project after the BigQuery data connection used in the pipeline is deleted. This limitation should be kept in mind when managing projects and pipelines that involve BigQuery to ensure smooth operation and avoid any potential disruptions.
Required privileges for BigQuery
Google Cloud Storage roles:
The following table describes roles that are associated with cloud storage and lists the permissions that are contained in each role.
| Roles | Permissions |
|---|---|
| Storage Object Creator
|
storage.multipartUploads.create storage.multipartUploads.abort storage.objects.create |
| Storage Object User
|
storage.objects.get storage.objects.list storage.objects.delete |
| Storage Insights Collector Service
|
storage.buckets.get |
BigQuery roles
These roles help define the level of access and actions users can perform in BigQuery, ensuring that appropriate permissions are granted based on their needs.
- "BigQuery User"
- "BigQuery Data Editor"
- "BigQuery Jobs User"
To configure BigQuery as a connection for replication pipeline, the following fields need to be configured:
-
Service account to use: Select one of the following options:
- Service account specified in the connection: Contains a key pair for authentication. The permissions are based on what the user has access to.
- Specify: Choose this option to create the service account on Google Cloud.
- Service Account JSON: Create a service account key and paste the object text in the Service Account JSON field.
-
Google cloud storage directory: Name of the directory where you
can store any amount of data that can be retrieved later. For example,
/<foldername>. - Staging schema: Specifies the schema where the staging table should be created. If the user does not specify the staging schema, then the default schema for the staging table is same as the schema specified in the target.
-
Batch Apply Threshold: Specify the apply threshold criteria.
Records will be applied in batches when either the duration time or number of records is
met, whichever comes first:
- Duration: The time replication waits before applying
changed data to the target in a batch.
- Specify (the default): When selected, enter the number of seconds, minutes, or hours before changes are applied. 60 seconds is the default.
- No Limit: When selected, no time limit is used and the batch apply threshold is based on the number of Records collected.
- Records: The number of records collected before applying
changed data to the target in a batch.
- No Limit (the default): When selected, no record limit is used and the batch apply threshold is based on the Duration time (specified above).
- Specify: When selected, enter the number of records collected before
changes are applied. Specify records by the hundreds, thousands, or millions. 10
thousand records is the default.Note:
DurationandRecordscannot both be set to No Limit.
- Duration: The time replication waits before applying
changed data to the target in a batch.
Compatibility matrix for IBMi to BigQuery
A compatibility matrix for IBMi to BigQuery provides an overview of the interoperability between IBMi systems and Google's BigQuery cloud data warehouse. It specifies the compatibility of data transfer mechanisms, including options like batch uploads via Google Cloud Storage or direct streaming through APIs. The matrix also outlines supported data formats such as DB2 databases, flat files, and JSON, detailing how these can be ingested into BigQuery for analysis.
| BigQueryType -> | BigNumeric / Bigdecimal | Bytes | Date | Datetime | Float64 | Int64 | Numeric / Decimal | String | Time | Timestamp |
|---|---|---|---|---|---|---|---|---|---|---|
| SourceType | ||||||||||
| INTEGER | ✓ | |||||||||
| SMALLINT | ✓ | |||||||||
| BIGINT | ✓ | |||||||||
| DECIMAL4,5 | ✓4 | ✓5 | ||||||||
| NUMERIC4,5 | ✓4 | ✓5 | ||||||||
| FLOAT | ✓ | |||||||||
| REAL | ✓ | |||||||||
| DOUBLE | ✓ | |||||||||
| DATE | ✓ | |||||||||
| TIME | ✓ | |||||||||
| TIMESTAMP | ✓ | ✓ | ||||||||
| CHAR | ✓ | |||||||||
| VARCHAR | ✓ | |||||||||
| LONGVARCHAR2 | ✓ | |||||||||
| CHARFORBITDATA1 | ✓ | ✓ | ||||||||
| VARCHARFORBITDATA1 | ✓ | ✓ | ||||||||
| LONGVARCHARFORBITDATA2 | ||||||||||
| BINARY2 | ✓ | |||||||||
| VARBINARY2 | ||||||||||
| BLOB | ✓ | |||||||||
| CLOB | ✓ | |||||||||
| DBCLOB | ✓ | |||||||||
| GRAPHIC3 | ✓ | |||||||||
| VARGRAPHIC3 | ✓ |
- Source types with ^1:
- If the TRANSLATE_BINARY flag is 'Y', then treat the FOR BIT DATA as string.
- If the TRANSLATE_BINARY flag is 'N', then treat the FOR BIT DATA as bytes.
- Source types with ^2 are not supported as a datasource.
- Source types with ^3, GRAPHIC and VARGRAPHIC are supported in SaaS and CDC director. Also supported by default for CDC versions 5.7 or later.
- Source types with ^4 are supported for decimal/numeric source columns with maximum Precision of 76 and maximum scale of 38.
- Source types with ^5 are supported for decimal/numeric source columns with maximum Precision of 38 and maximum scale of 38.
Compatibility matrix for Db2i to BigQuery
A compatibility matrix for Db2i to BigQuery explains how well IBM Db2 for i databases can work with Google's BigQuery cloud platform. It details how data can be transferred from Db2i databases to BigQuery for analysis, covering methods, data formats, tools, and any limitations that may affect the process.
| BigQueryType -> | BigNumeric / Bigdecimal | Bytes | Date | Datetime | Float64 | Int64 | Numeric / Decimal | String | Time | Timestamp |
|---|---|---|---|---|---|---|---|---|---|---|
| SourceType | ||||||||||
| INTEGER | ✓ | ✓ | ✓ | ✓5snl+16 | ||||||
| SMALLINT | ✓ | ✓ | ✓ | ✓5snl+16 | ||||||
| BIGINT | ✓ | ✓ | ✓ | ✓5snl+16 | ||||||
| DECIMAL4,5 | ✓4 | ✓ | ✓ | ✓5snl+26 | ||||||
| NUMERIC4,5 | ✓4 | ✓ | ✓5snl+26 | |||||||
| FLOAT | ✓ | ✓277 | ||||||||
| REAL | ✓ | ✓277 | ||||||||
| DOUBLE | ✓ | ✓277 | ||||||||
| DATE | ✓ | ✓ | ||||||||
| TIME | ✓ | ✓ | ||||||||
| TIMESTAMP | ✓ | ✓ | ✓ | |||||||
| CHAR | ✓ | |||||||||
| VARCHAR | ✓ | |||||||||
| LONGVARCHAR2 | ||||||||||
| CHARFORBITDATA1 | ✓ | ✓ | ||||||||
| VARCHARFORBITDATA1 | ✓ | ✓ | ||||||||
| LONGVARCHARFORBITDATA2 | ||||||||||
| BINARY2 | ||||||||||
| VARBINARY2 | ||||||||||
| BLOB | ✓ | |||||||||
| CLOB | ✓ | |||||||||
| DBCLOB | ✓ | |||||||||
| GRAPHIC3 | ✓ | |||||||||
| VARGRAPHIC3 | ✓ |
- Source types with ^1:
- If the charForBit flag is 'Y', then treat the FOR BIT DATA as string.
- If the charForBit flag is 'N', then treat the FOR BIT DATA as bytes.
- Source types with ^2 are not supported as a datasource.
- Source types with ^3, GRAPHIC and VARGRAPHIC are supported in SaaS and CDC director. Also supported by default for CDC versions 5.7 or later.
- Source types with ^4 are supported for decimal/numeric source columns with maximum Precision of 76 and maximum scale of 38.
- Source types with ^5 are supported for decimal/numeric source columns with maximum Precision of 38 and maximum scale of 38.
- Source types with ^6 indicate the maximum resulting string length after conversion.
- Source types with ^7 indicate the maximum resulting string length after conversion.