New line characters removed from request column in v_monitor.query_requests

It appears new line characters are converted to space characters in the request column of v_monitor.query_requests. This is a problem because one can no longer determine where comments end. For example:

Original Query:
SELECT 1 -- revenue
FROM transactions;

Logged Query:
SELECT 1 -- revenue FROM transactions;

Comments

  • Hi!

    What Vertica version? Agree with you its not good, but I can't reproduce it, all comments are truncated:
    node_name           | v_dev_node0001
    user_name | daniel
    session_id | synapse-2677:0x1f0
    request_id | 6
    transaction_id | 45035996273725090
    statement_id | 6
    request_type | QUERY
    request | SELECT 1 FROM transactions;
    request_label |
    search_path | "$user", public, v_catalog, v_monitor, v_internal
    memory_acquired_mb |
    success | t
    error_count |
    start_timestamp | 2015-01-07 11:44:18.832581+02
    end_timestamp | 2015-01-07 11:44:19.133265+02
    request_duration_ms | 301
    is_executing | f

    Lets try with label:
    SELECT /*+label(comment)*/ 1 -- revenue
    FROM transactions;
    Now extract from QUERY_REQUESTS table:
    daniel=>  select * from query_requests where request_label = 'comment';
    -[ RECORD 1 ]-------+--------------------------------------------------
    node_name | v_dev_node0001
    user_name | daniel
    session_id | synapse-2677:0x1f0
    request_id | 9
    transaction_id | 45035996273725090
    statement_id | 9
    request_type | QUERY
    request | SELECT /*+label(comment)*/ 1 FROM transactions;
    request_label | comment
    search_path | "$user", public, v_catalog, v_monitor, v_internal
    memory_acquired_mb |
    success | t
    error_count |
    start_timestamp | 2015-01-07 11:47:19.820904+02
    end_timestamp | 2015-01-07 11:47:20.127018+02
    request_duration_ms | 307
    is_executing | f
    Comments are truncated.
    daniel=> select version();
    -[ RECORD 1 ]-------------------------------
    version | Vertica Analytic Database v7.1.1-2

  • I also verified a '\r' as new line, still comment is truncated.
    daniel=>  select * from query_requests where request_label = 'XXX';
    -[ RECORD 1 ]-------+--------------------------------------------------
    node_name | v_dev_node0001
    user_name | daniel
    session_id | synapse-2677:0x389
    request_id | 1
    transaction_id | 45035996273725138
    statement_id | 1
    request_type | QUERY
    FROM transactions; | SELECT /*+label(XXX)*/ 1
    request_label | XXX
    search_path | "$user", public, v_catalog, v_monitor, v_internal
    memory_acquired_mb |
    success | t
    error_count |
    start_timestamp | 2015-01-07 11:53:24.671485+02
    end_timestamp | 2015-01-07 11:53:24.986265+02
    request_duration_ms | 315
    is_executing | f

  • Hi,

    I'm using v7.0.2-1. I also tried from the vsql shell and get the same result as you. We noticed the problem on queries sent through other clients. I'm guessing that the vsql shell strips comments before sending them and some clients do not. I can reproduce the issue when queries are sent from JDBC.

    Thanks for your help!
  • Hi!

    Indeed, odbc:
    daniel=> select * from query_requests where request_label = 'pyODBC';
    -[ RECORD 1 ]-------+---------------------------------------------------------
    node_name | v_dev_node0001
    user_name | daniel
    session_id | synapse-4633:0xfe9
    request_id | 5
    transaction_id | 45035996273726071
    statement_id | 8
    request_type | QUERY
    request | SELECT /*+label(pyODBC)*/ 1 -- revenue FROM transactions
    request_label | pyODBC
    search_path | "$user", public, v_catalog, v_monitor, v_internal
    memory_acquired_mb |
    success | t
    error_count |
    start_timestamp | 2015-01-07 23:46:59.288301+02
    end_timestamp | 2015-01-07 23:46:59.725608+02
    request_duration_ms | 437
    is_executing | f
    daniel=> select request from query_requests where request_label = 'pyODBC';
    -[ RECORD 1 ]-----------------------------------------------------
    request | SELECT /*+label(pyODBC)*/ 1 -- revenue FROM transactions

    And so we found a workaround/solution - you should truncate comments in your app before sending a query to Vertica server.

    Why it should be done on client side (truncate comments)?
    Although Vertica 7 supports for 32MB  for a single value, system tables fields are limited to 64000 bytes. You can confirm it by creating a dummy query with length more than 64000 bytes (create 65KB of comments - take some Shakespeare sonnet and comment it). You will see that field REQUEST in QUERY_REQUESTS table are limited to 64000 bytes.

    I will not post a query, but:
    $ du -b /tmp/qlen.sql        # print query size in bytes
    177244 /tmp/qlen.sql


    $ vsql -c "select audit_length(request) from query_requests where request_label = 'qlen';"
    audit_length
    --------------
    64000 -- odbc
    64000 -- jdbc
    3851 -- vsql
    (3 rows)

    So you can rely on field REQUEST in any way(think about case when just a query without comment takes more than 64B - problem).
  • Truncating comments on the client side is not feasible. We don't control what clients analysts use to connect to Vertica. Additionally the comments can be helpful in understanding queries in the log so we'd prefer not to truncate them. The 64Kb cutoff is not too much of a concern. In practice queries of interest are rarely that long.

    I've discovered that the table v_internal.dc_requests_issued also has a request column which does not suffer from the same issue. I can't find documentation on this table though. Is it ok practice to use it? And if we do use it and want to join to v_monitor.query_requests is the correct join on transaction_id, request_id, and session_id?

    Thanks!
  • Hi Jake!
    We don't control what clients analysts use to connect to Vertica.
    Proxy? :p
    Additionally the comments can be helpful in understanding queries in the log so we'd prefer not to truncate them.
    Agree. I think that Vertica should increase this field to a max size - 32,768,000 bytes. Just my opinion.
    I've discovered that the table v_internal.dc_requests_issued also has a request column
    I would say yes but on your own risk. As you pointed its undocumented and Vertica can freely to change an API but in this case its belong to Data Collector and DC is documented well enough.
    daniel=> select table_name from v_monitor.DATA_COLLECTOR where table_name ilike '%request%';
    table_name
    -----------------------
    dc_lock_requests
    dc_requests_completed
    dc_requests_issued
    dc_requests_retried
    (4 rows)
    and
    daniel=> select data_collector_help();
    data_collector_help

    Usage Data Collector
    The data collector retains history of important system activities.
    This data can be used as a reference of what actions have been taken by users,
    but it can also be used to locate performance bottlenecks, or identify
    potential improvements to the Vertica configuration.
    This data is queryable via Vertica system tables.

    The list of data collector components, and some statistics, can be found using:
    SELECT * FROM v_monitor.data_collector;

    The amount of data retained by size and time can be controlled with several functions.
    To just set the size amount:
    set_data_collector_policy(<component>,
    <memory retention (KB)>,
    <disk retention (KB)>);

    To set both the size and time amounts (the smaller one will dominate):
    set_data_collector_policy(<component>,
    <memory retention (KB)>,
    <disk retention (KB)>,
    <interval>);

    To set just the time amount:
    set_data_collector_time_policy(<component>,
    <interval>);
    To set just the time amount for all tables:
    set_data_collector_time_policy(<interval>);

    The current retention policy for a component can be queried with:
    get_data_collector_policy(<component>);

    Data on disk is kept in the "DataCollector" directory under the Vertica
    catalog path. This directory also contains instructions on how to load
    the monitoring data into another Vertica database.

    To move the data collector logs and instructions to other storage locations,
    create labeled storage locations using add_location and then use:
    set_data_collector_storage_location(<storage_label>);

    Additional commands can be used to affect the data collection logs.
    The log can be cleared with:
    clear_data_collector([<optional component>]);
    The log can be synchronized with the disk storage using:
    flush_data_collector([<optional component>]);


    (1 row)
    daniel=>

  • jbondurant_gsnjbondurant_gsn Vertica Customer

    FYI: v_internal.dc_requests_issued isn't broken, like v_monitor.query_requests.

  • LenoyJLenoyJ - Select Field - Employee
    edited May 2020

    @jbondurant_gsn, this isn't really broken. It is as intended. v_monitor.query_requests is basically a view into v_internal.dc_requests_issued. If you look at the view definition, it intentionally removes all line breaks. If line breaks are required - then, as you suggested, use dc_requests_issued that has the raw data.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file