pandas_to_vertica() ingestion issue, VerticaPy

fabricio5000fabricio5000 Community Edition User

Hello, good afternoon.
I'm having problems with the VerticaPy function "pandas_to_vertica()". It turns out that before I could create a DataFrame with pandas, from SQL queries to a PostgreSQL database, and with pandas_to_vertica() it let me correctly insert it into an already existing table.
However now I am getting the following error: " FileNotFoundError: [WinError 2] The system cannot find the file specified: 'billing.csv' ". It left me baffled since this problem with direct ingestion of DataFrames had not thrown me before, however now that I want to ingest a much larger DataFrame (5000 records) it throws me this error.
I share my code to see when I use this function and the error it throws me.
From already thank you very much.

Code:

Assembly of the data frame for Argentina

def dataframe_billing_AR(window_size):
ultimo_id_AR = vp.readSQL("select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR'")
query = f"SELECT 'AR' as SERVER, * FROM hca.billing WHERE billing_id BETWEEN {ultimo_id_AR['LAST_BILLING_ID'][0]} AND {ultimo_id_AR['LAST_BILLING_ID'][0]} + {window_size}"
frame_ARG = pd.read_sql(query, connection_db_postgre_AR())
pd.set_option('display.expand_frame_repr', False)
print(frame_ARG)
return frame_ARG

Data ingestion in Vertica

def ingest_frame(frame):
vp.pandas_to_vertica(df = frame, name = "billing", schema = "POCDATA", insert = True)
return

Full function for Argentina

def data_passage_ARG(window_size):
try:
connection_db_postgre_AR()
connection_vertica()
dataframe_billing_AR(window_size)
ingest_frame(dataframe_billing_AR(window_size))
finally:
connection_db_postgre_AR().close()
close_connection()

Implementation:

data_passage_ARG(5000)

(The assembled data frame is shown correct, but the error is in the part of the ingestion function I guess)

Error:

KeyError Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\verticapy\utilities.py:1122, in pandas_to_vertica(df, name, schema, dtype, parse_nrows, temp_path, insert)
1120 for c in df.columns:
1121 if df[c].dtype == object and isinstance(
-> 1122 df[c].loc[df[c].first_valid_index()], str
1123 ):
1124 str_cols += [c]

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:967, in _LocationIndexer.getitem(self, key)
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:1205, in _LocIndexer._getitem_axis(self, key, axis)
1204 self._validate_key(key, axis)
-> 1205 return self._get_label(key, axis=axis)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:1153, in _LocIndexer._get_label(self, label, axis)
1151 def _get_label(self, label, axis: int):
1152 # GH#5667 this will fail if the label is not present in the axis.
-> 1153 return self.obj.xs(label, axis=axis)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\generic.py:3864, in NDFrame.xs(self, key, axis, level, drop_level)
3863 else:
-> 3864 loc = index.get_loc(key)
...
-> 1187 os.remove(path)
1188 if clear:
1189 del tmp_df

FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado: 'billing.csv'

Answers

  • SruthiASruthiA Administrator

    did you verify if the file billing.csv is being created by pandas_to_vertica function? does the file exist?

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    did you verify if the file billing.csv is being created by pandas_to_vertica function? does the file exist?

    That's what I don't quite understand, why is it asking me if billing.csv exists, if in fact the pandas_to_vertica function ingests a dataframe directly without transforming it to csv, right? Because before when I used the function for the same reason, I didn't get that error and neither did any .csv file that was created appear in my folder

  • SruthiASruthiA Administrator

    Yes.. it does create CSV in the intermediate state before loading to vertica. Please find the documentation link below

    https://www.vertica.com/python/documentation_last/utilities/pandas_to_vertica/

    "Ingests a pandas DataFrame into the Vertica database by creating a CSV file and then using flex tables."

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    Yes.. it does create CSV in the intermediate state before loading to vertica. Please find the documentation link below

    https://www.vertica.com/python/documentation_last/utilities/pandas_to_vertica/

    "Ingests a pandas DataFrame into the Vertica database by creating a CSV file and then using flex tables."

    It's true, I read the official documentation well there, but if so, why didn't I get that error in previous ingestions? Nor did I see that an armed csv was ever created from calling the pandas_to_vertica function with my DataFrame... I don't know what to do

  • SruthiASruthiA Administrator

    Is it possible to share table DDL and output for the below query? I will try to reproduce the issue locally

    select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR'

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    Is it possible to share table DDL and output for the below query? I will try to reproduce the issue locally

    select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR'

    table DDL:

    CREATE TABLE POCDATA.billing
    (
    SERVER varchar(2) NOT NULL,
    billing_id numeric(10,0) NOT NULL,
    patient_id varchar(100),
    event_id varchar(100),
    hos_id numeric(10,0),
    msp_id numeric(10,0),
    doc_id numeric(10,0),
    heq_id numeric(10,0),
    appointment_time timestamp,
    action varchar(1),
    transaction_time timestamp,
    hos_name varchar(40),
    patient_full_name varchar(80),
    patient_num varchar(20),
    hin_name varchar(100),
    msp_name varchar(60),
    doc_full_name varchar(80),
    dni varchar(10)
    );

    ALTER TABLE POCDATA.billing ADD CONSTRAINT PK_BILLING PRIMARY KEY (SERVER, billing_id) ENABLED;

    The output of the " select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR' " is empty because that database does not currently have data yet

  • SruthiASruthiA Administrator

    if you don't have data, why are you trying to load to vertica?

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    if you don't have data, why are you trying to load to vertica?

    Because I have to extract data from postgreSQL databases and load them into a new Vertica database. To do this, I build a DataFrame from an SQL query to get the data I need to load, and then use VerticaPy functions, such as "pandas_to_vertica" to ingest that DataFrame directly into the Vertica database.
    What happens is that the database in Vertica is empty because I haven't done any ingestion yet, precisely because of the error that I'm currently getting...

  • SruthiASruthiA Administrator

    yes it is possible. choose a table which as data and try running your code. if you still get the error, please share me the information. I will review it

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    yes it is possible. choose a table which as data and try running your code. if you still get the error, please share me the information. I will review it

    I've tried everything but still can't get it to work

  • VValdarVValdar Vertica Employee Employee

    Hi fabricio5000,

    By any chance did you see the latest release of VerticaPy 0.12?

    You can set up a DBLink from Vertica to Postgresql then query it in a lively manner.
    Please check here: https://www.vertica.com/python/workshop/full_stack/dblink_integration/index.php

    I'm sure you'll be positively amazed! well, I am

  • badroualibadrouali Vertica Employee

    Hi @fabricio5000,

    Can you please send me a reproducer so I can identify the problem?

    Badr

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file