We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


pandas_to_vertica() ingestion issue, VerticaPy — Vertica Forum

pandas_to_vertica() ingestion issue, VerticaPy

fabricio5000fabricio5000 Community Edition User

Hello, good afternoon.
I'm having problems with the VerticaPy function "pandas_to_vertica()". It turns out that before I could create a DataFrame with pandas, from SQL queries to a PostgreSQL database, and with pandas_to_vertica() it let me correctly insert it into an already existing table.
However now I am getting the following error: " FileNotFoundError: [WinError 2] The system cannot find the file specified: 'billing.csv' ". It left me baffled since this problem with direct ingestion of DataFrames had not thrown me before, however now that I want to ingest a much larger DataFrame (5000 records) it throws me this error.
I share my code to see when I use this function and the error it throws me.
From already thank you very much.

Code:

Assembly of the data frame for Argentina

def dataframe_billing_AR(window_size):
ultimo_id_AR = vp.readSQL("select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR'")
query = f"SELECT 'AR' as SERVER, * FROM hca.billing WHERE billing_id BETWEEN {ultimo_id_AR['LAST_BILLING_ID'][0]} AND {ultimo_id_AR['LAST_BILLING_ID'][0]} + {window_size}"
frame_ARG = pd.read_sql(query, connection_db_postgre_AR())
pd.set_option('display.expand_frame_repr', False)
print(frame_ARG)
return frame_ARG

Data ingestion in Vertica

def ingest_frame(frame):
vp.pandas_to_vertica(df = frame, name = "billing", schema = "POCDATA", insert = True)
return

Full function for Argentina

def data_passage_ARG(window_size):
try:
connection_db_postgre_AR()
connection_vertica()
dataframe_billing_AR(window_size)
ingest_frame(dataframe_billing_AR(window_size))
finally:
connection_db_postgre_AR().close()
close_connection()

Implementation:

data_passage_ARG(5000)

(The assembled data frame is shown correct, but the error is in the part of the ingestion function I guess)

Error:

KeyError Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\verticapy\utilities.py:1122, in pandas_to_vertica(df, name, schema, dtype, parse_nrows, temp_path, insert)
1120 for c in df.columns:
1121 if df[c].dtype == object and isinstance(
-> 1122 df[c].loc[df[c].first_valid_index()], str
1123 ):
1124 str_cols += [c]

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:967, in _LocationIndexer.getitem(self, key)
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:1205, in _LocIndexer._getitem_axis(self, key, axis)
1204 self._validate_key(key, axis)
-> 1205 return self._get_label(key, axis=axis)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:1153, in _LocIndexer._get_label(self, label, axis)
1151 def _get_label(self, label, axis: int):
1152 # GH#5667 this will fail if the label is not present in the axis.
-> 1153 return self.obj.xs(label, axis=axis)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\generic.py:3864, in NDFrame.xs(self, key, axis, level, drop_level)
3863 else:
-> 3864 loc = index.get_loc(key)
...
-> 1187 os.remove(path)
1188 if clear:
1189 del tmp_df

FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado: 'billing.csv'

Answers

  • SruthiASruthiA Administrator

    did you verify if the file billing.csv is being created by pandas_to_vertica function? does the file exist?

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    did you verify if the file billing.csv is being created by pandas_to_vertica function? does the file exist?

    That's what I don't quite understand, why is it asking me if billing.csv exists, if in fact the pandas_to_vertica function ingests a dataframe directly without transforming it to csv, right? Because before when I used the function for the same reason, I didn't get that error and neither did any .csv file that was created appear in my folder

  • SruthiASruthiA Administrator

    Yes.. it does create CSV in the intermediate state before loading to vertica. Please find the documentation link below

    https://www.vertica.com/python/documentation_last/utilities/pandas_to_vertica/

    "Ingests a pandas DataFrame into the Vertica database by creating a CSV file and then using flex tables."

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    Yes.. it does create CSV in the intermediate state before loading to vertica. Please find the documentation link below

    https://www.vertica.com/python/documentation_last/utilities/pandas_to_vertica/

    "Ingests a pandas DataFrame into the Vertica database by creating a CSV file and then using flex tables."

    It's true, I read the official documentation well there, but if so, why didn't I get that error in previous ingestions? Nor did I see that an armed csv was ever created from calling the pandas_to_vertica function with my DataFrame... I don't know what to do

  • SruthiASruthiA Administrator

    Is it possible to share table DDL and output for the below query? I will try to reproduce the issue locally

    select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR'

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    Is it possible to share table DDL and output for the below query? I will try to reproduce the issue locally

    select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR'

    table DDL:

    CREATE TABLE POCDATA.billing
    (
    SERVER varchar(2) NOT NULL,
    billing_id numeric(10,0) NOT NULL,
    patient_id varchar(100),
    event_id varchar(100),
    hos_id numeric(10,0),
    msp_id numeric(10,0),
    doc_id numeric(10,0),
    heq_id numeric(10,0),
    appointment_time timestamp,
    action varchar(1),
    transaction_time timestamp,
    hos_name varchar(40),
    patient_full_name varchar(80),
    patient_num varchar(20),
    hin_name varchar(100),
    msp_name varchar(60),
    doc_full_name varchar(80),
    dni varchar(10)
    );

    ALTER TABLE POCDATA.billing ADD CONSTRAINT PK_BILLING PRIMARY KEY (SERVER, billing_id) ENABLED;

    The output of the " select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR' " is empty because that database does not currently have data yet

  • SruthiASruthiA Administrator

    if you don't have data, why are you trying to load to vertica?

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    if you don't have data, why are you trying to load to vertica?

    Because I have to extract data from postgreSQL databases and load them into a new Vertica database. To do this, I build a DataFrame from an SQL query to get the data I need to load, and then use VerticaPy functions, such as "pandas_to_vertica" to ingest that DataFrame directly into the Vertica database.
    What happens is that the database in Vertica is empty because I haven't done any ingestion yet, precisely because of the error that I'm currently getting...

  • SruthiASruthiA Administrator

    yes it is possible. choose a table which as data and try running your code. if you still get the error, please share me the information. I will review it

  • fabricio5000fabricio5000 Community Edition User

    @SruthiA said:
    yes it is possible. choose a table which as data and try running your code. if you still get the error, please share me the information. I will review it

    I've tried everything but still can't get it to work

  • VValdarVValdar Vertica Employee Employee

    Hi fabricio5000,

    By any chance did you see the latest release of VerticaPy 0.12?

    You can set up a DBLink from Vertica to Postgresql then query it in a lively manner.
    Please check here: https://www.vertica.com/python/workshop/full_stack/dblink_integration/index.php

    I'm sure you'll be positively amazed! well, I am

  • badroualibadrouali Vertica Employee

    Hi @fabricio5000,

    Can you please send me a reproducer so I can identify the problem?

    Badr

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file