Connecting to Sources

We will be working with Pagila, a sample database that represents a DVD rental store. The database contains information about films, actors, customers, and stores. Postgres is used as the database engine. To quickly setup postgres, you can use docker-compose.

version: '3.8'
services:
  postgres:
    container_name: dozer-pagila-postgres
    image: debezium/postgres:13
    volumes:
      - ./data/init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      POSTGRES_DB: pagila
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      ALLOW_IP_RANGE: 0.0.0.0/0
    ports:
      - '5433:5432'
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U postgres -d pagila" ]
      interval: 5s
      timeout: 5s
      retries: 5

Don't forget to download data using the following command:

#!/bin/bash

# https://github.com/devrimgunduz/pagila
mkdir -p data

curl https://raw.githubusercontent.com/devrimgunduz/pagila/726c724df9f86406577c47790d6f8e6f2be06186/pagila-schema.sql \
     https://raw.githubusercontent.com/devrimgunduz/pagila/726c724df9f86406577c47790d6f8e6f2be06186/pagila-data.sql \
     | cat > data/init.sql

Sourcing trips data from Postgres

Open a new file dozer-config.yaml and add the following configuration section:

cache_max_map_size: 10737418240
version: 1
app_name: simple-pg-sample
connections:
  - name: pagila_conn
    config: !Postgres
      user: postgres
      password: postgres
      host: localhost
      port: 5433
      database: pagila
  

sources:
  - name: actors
    table_name: actor
    connection: pagila_conn

sinks:
  - table_name: actors
    config: !Dummy

Start Dozer with the command:

dozer run --enable-progress

You should be seeing a screen like the following

 INFO Initializing app: simple-pg-sample    
 INFO Data directory: ./.dozer    
 INFO [API] Configuration
+------+----+------+
| Type | IP | Port |
+------+----+------+
| REST | -  | -    |
+------+----+------+
| GRPC | -  | -    |
+------+----+------+
    
 INFO [API] Endpoints
+------+------+
| Path | Name |
+------+------+
    
 INFO Source: Initializing input schema: actor
+-------------+-----------+----------+-------+
| Field       | Type      | Nullable | PK    |
+-------------+-----------+----------+-------+
| actor_id    | Int       | false    | true  |
+-------------+-----------+----------+-------+
| first_name  | String    | false    | false |
+-------------+-----------+----------+-------+
| last_name   | String    | false    | false |
+-------------+-----------+----------+-------+
| last_update | Timestamp | false    | false |
+-------------+-----------+----------+-------+
    
 INFO [pipeline] Validation completed    
 INFO Source: Initializing input schema: actor
+-------------+-----------+----------+-------+
| Field       | Type      | Nullable | PK    |
+-------------+-----------+----------+-------+
| actor_id    | Int       | false    | true  |
+-------------+-----------+----------+-------+
| first_name  | String    | false    | false |
+-------------+-----------+----------+-------+
| last_name   | String    | false    | false |
+-------------+-----------+----------+-------+
| last_update | Timestamp | false    | false |
+-------------+-----------+----------+-------+
    
 INFO [pipeline] Validation completed    
 INFO Starting Internal Server on 0.0.0.0:50053    
 INFO Starting Rest Api Server on http://0.0.0.0:8080 with security: None    
▹▹▹▹▹ actor: 0: 0/s                                                                                                 
INFO Snapshotting for connection pagila_conn took 5.471011ms                                                      

This means Dozer is succesfully running. Now we can add one more connection to our dozer-config.yaml.

Adding an additonal source

We will now be adding a second connection to a PostgreSQL database. For simplicity, we will use Supabase, a fully managed PostgreSQL database service.

Head over to Supabase and create a new project. Create a new table named zones and import the data from this CSV file. To do that, click on the button Import data via spreadsheet when creating a new table in Supabase. Upload the CSV file and select LocationID as primary key.

Once created, head over to your project settings in Supabase, click on Database and take note of the Connection Info details. These will be used in the Dozer configuration to connect to this database instance.

Now edit your dozer-config.yaml and add a new PostgreSQL connection, a new source and a new endpoint. The new dozer-config.yaml file should look like this:

cache_max_map_size: 10737418240
version: 1
app_name: simple-pg-sample
connections:
  - name: pagila_conn
    config: !Postgres
      user: postgres
      password: postgres
      host: localhost
      port: 5433
      database: pagila
  
  - config: !Postgres
      user: ** Your Supabase user ** (generally postgres)
      password: ** Your Supabase password **
      host: ** Your Supabase host **
      port: ** Your Supabase port ** (generally 5432)
      database: postgres
    name: pg

sources:
  - name: actors
    table_name: actor
    connection: pagila_conn

  - name: films
    table_name: film
    connection: pagila_conn
    columns:
      - film_id
      - title

  - name: zones
    table_name: zones
    connection: pg

sinks:
  - table_name: actors
    config: !Dummy

  - table_name: zones
    config: !Dummy

Now you can restart Dozer. You will notice that a new collection called zones is available. In the next section we will be adding SQL transformations to our data sources.

In the next section we will be connecting to a real sink instead of dummy for the replication pipeline.

Connecting to Sources

Sourcing trips data from Postgres​

Adding an additonal source​

Sourcing trips data from Postgres

Adding an additonal source