information_schema support for ANSI SQL Support

simon_aubert_bd · January 2024

Hello,

According to wikipedia :
"In relational databases, the information schema (information_schema) is an ANSI-standard set of read-only views which provide information about all of the tables, views, columns, and procedures in a database"

I think such a feature would help the support of Vertica by many more tools and it's never a bad thing when you follow the SQL norm. Standard, normalization and interoperability matter. Especially when your competition has it (these last years, it has been implemented in Hive or Monet DB as an example)

Best regards,

Simon

mosheg · January 2024

Vertica provides the information schema as read only system tables and views that let you monitor your database and evaluate settings of its objects. You can query these tables just as you do other tables, depending on privilege requirements.
See: https://docs.vertica.com/24.1.x/en/sql-reference/system-tables/

simon_aubert_bd · January 2024

Hello @mosheg . Correct me if I'm wrong but it's
V_CATALOG schema
V_MONITOR schema

even if it's the same usage, it's clearly not the same name

mosheg · January 2024

The system tables in v_catalog schema provide information about the objects in a database like tables, constraints, users, projections, and so on. This is what the "information schema" is all about.
For example: V_CATALOG.columns provides all tables columns information.
See: https://docs.vertica.com/24.1.x/en/sql-reference/system-tables/v-catalog-schema/columns/

VValdar · January 2024

Hi all,

yes @mosheg you're right we can find the same relevant information, I think @simon_aubert_bd's question is about the official ANSI specification of this schema that we don't support.
You would need to manually implement it.

simon_aubert_bd · January 2024

Hello @VValdar Yes, that's my point. But it's more a feature request that a question I think respecting the ANSI specification could be benefit for Vertica users and sales.

According to chatGPT (I'm really too honest)

Respecting ANSI SQL Information Schema is important for several reasons in the context of relational database management systems (RDBMS):

Portability: ANSI SQL (Structured Query Language) is a standard for interacting with relational databases. The Information Schema, part of the SQL standard, provides a consistent way to query metadata about database objects such as tables, columns, and constraints. Respecting the ANSI SQL Information Schema ensures that your SQL queries for metadata are more likely to work across different database systems, promoting portability of your code.

Compatibility: Following the ANSI SQL standard helps ensure compatibility with various database vendors. Although different database systems may have their own extensions and specific features, a common subset of SQL and the Information Schema is widely supported. This compatibility simplifies the process of migrating databases from one vendor to another or using different databases in the same application.

Consistency: The Information Schema provides a uniform and standardized way to retrieve metadata. This consistency simplifies the development and maintenance of applications by allowing developers to write queries that work across different database systems without significant modifications.

Future-proofing: Relying on the ANSI SQL Information Schema can make your code more resilient to changes in the database system or upgrades. If a database vendor updates its system to comply more closely with the ANSI SQL standard, your queries based on the Information Schema are less likely to break.

Documentation and Understanding: The Information Schema serves as a standardized way to document the structure and details of database objects. This can be invaluable for developers, database administrators, and anyone interacting with the database. It provides a clear and systematic way to understand the database schema, which can lead to better database design and maintenance practices.

Security: In some cases, relying on the Information Schema can contribute to security by limiting direct access to system tables. Instead of querying system tables, developers can use the ANSI SQL standard views provided by the Information Schema, reducing the risk of unintentional changes to critical system data.

Best regards,

Simon

marcothesane · January 2024

Well, we are in very good company.
Last time I checked

Oracle did not have it
IBM DB2 (Version 10.x) did not have it
SQL Server did not have it
Teradata did not have it

They all have a system view/table named "columns" or similar, one named "tables" or similar, etc. in a differently named schema.
Vertica has the "v_catalog" schema.

All or nearly all DBMS-s offer a way to pre-set a default schema.
In PostgreSQL and Vertica it's SET SEARCH_PATH TO schema1,schema2,schema3; its effect is that, if you go SELECT * FROM foo, foo is searched in schema1, if not found in schema2, etc. until it's found. And v_catalog is in the schema by defaul. The default search path is <_current_user_>, public, v_catalog, v_monitor, v_internal, v_func.

So -
SELECT table_name FROM tables; , for example, will always work.

We never felt the urgency to explicitly create an alias for our v_catalog schema.

As I already stated a long time ago in this discussion thread - feel free to create your own schema.
run the script below as dbadmin from vsql:

CREATE SCHEMA IF NOT EXISTS information_schema;

\a
\t
\o crviews.sql
SELECT
  'CREATE VIEW  information_schema.'||table_name||' AS SELECT * FROM v_catalog.'||table_name||';'
FROM system_tables WHERE table_schema='v_catalog'
;
\o
\I crviews.sql

GRANT SELECT ON ALL VIEWS IN information_schema TO public;

simon_aubert_bd · January 2024

@marcothesane Indeed you're not alone. But allow me to add a few words about it :
-Oracle and Microsoft are leaders in db for several decades, they just don't care about SQL norm.
-DB2 is legacy for 8 years now
-All are "declining" according to https://db-engines.com/en/ranking_trend (Well, it's relative and subject to interpretation but it looks like market share is also declining)
-on the opposite side, the most successful db these last years, like Snowflake, respect the information _schema.
-you're supposed to ambition to do better than competitors

marcothesane · January 2024

The clean way to deal with metadata in a relational database management system, in my eyes, is to use a standard supported by all of them as of now:

It's the collection of functions like SQLTables(), SQLColumns(), SQLPrimaryKeys(), SQLForeignKeys() , as supported by the JDBC and ODBC standard.

Every RDBMS has a JDBC and an ODBC driver supporting them.

And, as Vertica has no indexes, and no materialised views, we have no way of supporting the standard anyway...

simon_aubert_bd · January 2024

Hello @marcothesane
I will be humble : I don't know what is the cleanest way to deal with metadata. On the other side what I know is that not all software are developped with jdbc/odbc specifications in mind, even not always jdbc/odbc connectivity at all.

If you take Grafana, it uses a go plugin. In python, you have dedicated libraries, in Alteryx, there are several SQL queries send to database in order to determine where the table catalog is located :
Mar 01 11:37:21.318 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select USER(), APPLICATION_ID() from system.iota
Mar 01 11:37:22.863 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select USER as USER_NAME from SYSIBM.SYSDUMMY1
Mar 01 11:37:23.454 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select * from rdb$relations
Mar 01 11:37:23.546 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select first 1 dbinfo('version', 'full') from systables
Mar 01 11:37:23.707 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select #01/01/01# as AccessDate
Mar 01 11:37:23.868 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: exec sp_server_info 1
Mar 01 11:37:24.093 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select top (0) * from INFORMATION_SCHEMA.INDEXES
Mar 01 11:37:24.219 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: SELECT SERVERPROPERTY('edition')
Mar 01 11:37:24.423 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select DATABASE() as database, VERSION() as version
Mar 01 11:37:24.635 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select * from sys.V_$VERSION at where RowNum<2
Mar 01 11:37:25.230 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select cast(version() as char(10)), (select 1 from pg_catalog.pg_class) as t
Mar 01 11:37:25.415 INFO 5264 HardyDataEngine::Prepare: Incoming SQL: select NAME from sqlite_master

So, yes, you can argue it's dirty (and in Alteryx case, I will totally agree) but that's the current situation.

But there's more : a lot of developers (like myself) only have to right to write sql query, we can't pass SQLTables() etc... in a Qlik Sense/Tableau/Alteryx/other bi tools. So when I write a Qlik Sense to retrieve all the tables, I have to find the table catalog. And if I have several RDBMS, I will have to maintain, duplicate, code while, in my dream, world, I would use a standard.

"And, as Vertica has no indexes, and no materialised views, we have no way of supporting the standard anyway..."

A dummy table with 0 rows can totally do the job

Best regards,

Simon

We're Moving!

Create My New Community Account Now

information_schema support for ANSI SQL Support

Comments

Leave a Comment