The DUPLICATES command identifies all rows in the target table or cursor whose key also appears in other rows.

Syntax

DUPLICATES /*+ … */ target WITH KEY (col1,...)

Results

The result set is all the rows that are duplicates.

Arguments

  • /*+ … */ - Insert optional Cirro Hint. Hints are added to generated SELECT statements.

    • For a single table, there is no need for a data movement so many hints are irrelevant.

    • For a query, the hints can also go in the query.

  • source - source datasource in form system.schema.database

  • target - target datasource in form system.schema.database

  • WITH KEY

    • Cirro cannot determine the primary keys of remote tables, so the KEY clause will be required when using DUPLICATES table.

    • If the KEY is not given, DUPLICATES looks for the table primary key.

Additional Information

  • Restriction on queries: attempting to execute DUPLICATES on a query like:
SELECT TABLE_TYPE AS TABLE_TYPE FROM (VALUES ('TABLE'),('SYNONYM'),('SYSTEM TABLE'),('VIEW')) AS TYPEINFO(TABLE_TYPE)

This query will fail, with error:

The number of columns in the derived column list must match the number of columns in table 'A'.

Examples

Example tables are drawn from the imdb database, for instance:

  • cast_info (id, person_id, movie_id, person_role_id, note, nr_order, role_id)

  • movie_keyword (id, movie_id, keyword_id)

  • company_name(id, name, country_code, imdb_id, name_pcode_nf, name_pcode_sf)

DUPLICATES {target | subQuery} [WITH KEY { (col1[,…]) | (*) }]

This is essentially syntax sugar for the query:

SELECT t.col1,... FROM <target or query> t
JOIN (SELECT d.<keyColumnNames> FROM <target or query> d GROUP BY d.<keyColumnNames> HAVING COUNT(*) > 1) k
ON (t.col1 = k.col1 OR (t.col1 IS NULL AND k.col1 IS NULL)) ... for all keyColumnNames
ORDER BY t.key1, t.key2..., t.col1, t.col2,...

This uses the primary key of cast_info, id, so this should never return any records but may if constraints have been relaxed during a data migration.

DUPLICATES lim.imdb.public.cast_info ;

This will return records for people more than one job (role_id) on a movie.

DUPLICATES lim.imdb.public.cast_info WITH KEY (person_id, movie_id) ;

We could expect (movie_id, keyword_id) to be a unique key but it is not.

DUPLICATES lim.imdb.public.movie_keyword WITH KEY (movie_id, keyword_id) ;

Role ids 1 and 2 are actor and actress, so this will return records for people playing more than one character in a movie.

DUPLICATES (SELECT * FROM lim.imdb.public.cast_info WHERE role_id IN (1, 2)) WITH KEY (person_id, movie_id) ;

This will use all columns as the key, and so return the identical rows.

DUPLICATES lim.imdb.public.role_type WITH KEY (*) ;