"where" when selecting from external table(HDFS), not working with FLOAT, BIGINT e.t.c.
Hello
- I create table in hadoop(cloudera quickstart 5.13) parquet format, with 4 column: TICKET_OID (BIGINT),
NOMER_ZAYAVKI(STRING),
LAT(DOUBLE),
LON(DOUBLE).
Filling with Data, and select without problem(by impala) :
select * from obj_ticket_pq where LON > 37.555848 and LAT < 55.892728 and LON < 38.203597 and LAT > 55.568758;
return 100 records;
- I crate in vertica(8.1.1) external table from HDFS
CREATE EXTERNAL TABLE test (TICKET_OID BIGINT, NOMER_ZAYAVKI VARCHAR(256), LAT DOUBLE PRECISION, LON DOUBLE PRECISION) AS COPY FROM 'hdfs:///data/OBJ_TICKET/pq/*' ON ANY NODE PARQUET;
select * from test; -work!
select * from test where LON > 37.555848 ;- not work!!! - result 0 column;
also if I try to put in "where" a condition on TICKET_OID(BIGINT) - "greater, less or equally" result of a select of 0 records
if I put conditions(equal, like etc.) on NOMER_ZAYAVKI(STRING) everything works correctly.
How do I make "where" work with numbers?
0
Comments
Can you show the output of your SELECT * FROM test; in Vertica? Maybe with a LIMIT of 5?
My simple test seems to work fine, although I am reading the Parquet data locally as opposed to from hdfs.
dbadmin=> select TICKET_OID, NOMER_ZAYAVKI, LON,LAT from test limit 5;
TICKET_OID | NOMER_ZAYAVKI | LON | LAT
------------+---------------+-----------+-----------
150529 | TT8397782 | 99.423919 | 56.076368
111766 | TT8163533 | |
115562 | TT8184930 | 65.328574 | 55.515793
115563 | TT8184944 | 37.55664 | 55.43101
109896 | TT8152791 | 34.290248 | 53.325734
(5 rows)
hm...
I repeated your test and everything work fine. The table was created, export to hdfs in paruet format. And then connected back to vertica as external.
"select...where" works correctly.
Probably in my case a wrong parquet format? I'm creating "bad" table on hdfs from impala.
I will try to compare the table files created in the Impala and in the vertica to understand the difference.
Thanks for the tip
I conducted an experiment on create data to hdfs in parquet format.
Create table on hdfs from vertica, hive, impala in same data:
Create external table from hdfs on 3 tables, and select this:
dbadmin=> select * from test_parquet_from_vertica;
TICKET_OID | NOMER_ZAYAVKI | LAT | LON
------------+---------------+----------+----------
315131 | TT8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_hive;
ticket_oid | nomer_zayavki | lat | lon
------------+---------------+----------+----------
315131 | tt8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_impala;
ticket_oid | nomer_zayavki | lat | lon
------------+---------------+----------+----------
315131 | TT8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_vertica where lat > 10;
TICKET_OID | NOMER_ZAYAVKI | LAT | LON
------------+---------------+----------+----------
315131 | TT8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_hive where lat > 10;
ticket_oid | nomer_zayavki | lat | lon
------------+---------------+----------+----------
315131 | tt8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_impala where lat > 10;
ticket_oid | nomer_zayavki | lat | lon
------------+---------------+-----+-----
(0 rows)
Info from parquet schema:
[root@quickstart lib]# sudo -u hdfs parquet-tools schema hdfs://localhost/data/test/pq_vertica
message schema {
optional int64 TICKET_OID;
optional binary NOMER_ZAYAVKI (UTF8);
optional double LAT;
optional double LON;
}
[root@quickstart lib]# sudo -u hdfs parquet-tools schema hdfs://localhost/data/test/pq_hive
message hive_schema {
optional int64 ticket_oid;
optional binary nomer_zayavki (UTF8);
optional double lat;
optional double lon;
}
[root@quickstart lib]# sudo -u hdfs parquet-tools schema hdfs://localhost/data/test/pq_impala
message schema {
optional int64 ticket_oid;
optional binary nomer_zayavki;
optional double lat;
optional double lon;
}
The only difference in nomer_zayavki(UTF8).
Imapala version 2.10 creates a parquet file not compatible with with Vertica?
I created 3 tables on hdfs with the same data, but created them from different programs: vertica, hive, impala
Connect them as external to vertica:
dbadmin=> select * from test_parquet_from_vertica;
TICKET_OID | NOMER_ZAYAVKI | LAT | LON
------------+---------------+----------+----------
315131 | TT8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_hive;
ticket_oid | nomer_zayavki | lat | lon
------------+---------------+----------+----------
315131 | tt8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_impala;
ticket_oid | nomer_zayavki | lat | lon
------------+---------------+----------+----------
315131 | TT8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_vertica where lat > 10;
TICKET_OID | NOMER_ZAYAVKI | LAT | LON
------------+---------------+----------+----------
315131 | TT8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_hive where lat > 10;
ticket_oid | nomer_zayavki | lat | lon
------------+---------------+----------+----------
315131 | tt8950718 | 55.74702 | 37.54171
(1 row)
dbadmin=> select * from test_parquet_from_impala where lat > 10;
ticket_oid | nomer_zayavki | lat | lon
------------+---------------+-----+-----
(0 rows)
On HDFS:
[root@quickstart lib]# sudo -u hdfs parquet-tools schema hdfs://localhost/data/test/pq_vertica
message schema {
optional int64 TICKET_OID;
optional binary NOMER_ZAYAVKI (UTF8);
optional double LAT;
optional double LON;
}
[root@quickstart lib]# sudo -u hdfs parquet-tools schema hdfs://localhost/data/test/pq_hive
message hive_schema {
optional int64 ticket_oid;
optional binary nomer_zayavki (UTF8);
optional double lat;
optional double lon;
}
[root@quickstart lib]# sudo -u hdfs parquet-tools schema hdfs://localhost/data/test/pq_impala
message schema {
optional int64 ticket_oid;
optional binary nomer_zayavki;
optional double lat;
optional double lon;
}
Perhaps imapala(2.10) is not compatible with the vertica(8.1.1)?
Hi,
I wonder if this is a data type incompatibility/casting issue?
For the Impala table, can you try these SELECT statements?
Note:
Impala 2.1 was introduced in CDH 5.3.
According to the Vertica 8.1.1 doc, Vertica was tested on CDH 5.7 and 5.8.
See:
https://www.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/SupportedPlatforms/IntegrationsForHadoop.htm
And Vertica 9.2 has been tested on CDH 5.11, 5.12 and 5.14.
See:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SupportedPlatforms/IntegrationsForHadoop.htm
Yes!!!
Both work.
Thank you very much for the help!
Cool! Note that I used the INT example just as an example. Don't use it for casting a double It will ROUND your number.
Of course I understand
Thanks again