pandas_to_vertica() ingestion issue, VerticaPy
Hello, good afternoon.
I'm having problems with the VerticaPy function "pandas_to_vertica()". It turns out that before I could create a DataFrame with pandas, from SQL queries to a PostgreSQL database, and with pandas_to_vertica() it let me correctly insert it into an already existing table.
However now I am getting the following error: " FileNotFoundError: [WinError 2] The system cannot find the file specified: 'billing.csv' ". It left me baffled since this problem with direct ingestion of DataFrames had not thrown me before, however now that I want to ingest a much larger DataFrame (5000 records) it throws me this error.
I share my code to see when I use this function and the error it throws me.
From already thank you very much.
Code:
Assembly of the data frame for Argentina
def dataframe_billing_AR(window_size):
ultimo_id_AR = vp.readSQL("select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR'")
query = f"SELECT 'AR' as SERVER, * FROM hca.billing WHERE billing_id BETWEEN {ultimo_id_AR['LAST_BILLING_ID'][0]} AND {ultimo_id_AR['LAST_BILLING_ID'][0]} + {window_size}"
frame_ARG = pd.read_sql(query, connection_db_postgre_AR())
pd.set_option('display.expand_frame_repr', False)
print(frame_ARG)
return frame_ARG
Data ingestion in Vertica
def ingest_frame(frame):
vp.pandas_to_vertica(df = frame, name = "billing", schema = "POCDATA", insert = True)
return
Full function for Argentina
def data_passage_ARG(window_size):
try:
connection_db_postgre_AR()
connection_vertica()
dataframe_billing_AR(window_size)
ingest_frame(dataframe_billing_AR(window_size))
finally:
connection_db_postgre_AR().close()
close_connection()
Implementation:
data_passage_ARG(5000)
(The assembled data frame is shown correct, but the error is in the part of the ingestion function I guess)
Error:
KeyError Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\verticapy\utilities.py:1122, in pandas_to_vertica(df, name, schema, dtype, parse_nrows, temp_path, insert)
1120 for c in df.columns:
1121 if df[c].dtype == object and isinstance(
-> 1122 df[c].loc[df[c].first_valid_index()], str
1123 ):
1124 str_cols += [c]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:967, in _LocationIndexer.getitem(self, key)
966 maybe_callable = com.apply_if_callable(key, self.obj)
--> 967 return self._getitem_axis(maybe_callable, axis=axis)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:1205, in _LocIndexer._getitem_axis(self, key, axis)
1204 self._validate_key(key, axis)
-> 1205 return self._get_label(key, axis=axis)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\indexing.py:1153, in _LocIndexer._get_label(self, label, axis)
1151 def _get_label(self, label, axis: int):
1152 # GH#5667 this will fail if the label is not present in the axis.
-> 1153 return self.obj.xs(label, axis=axis)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\pandas\core\generic.py:3864, in NDFrame.xs(self, key, axis, level, drop_level)
3863 else:
-> 3864 loc = index.get_loc(key)
...
-> 1187 os.remove(path)
1188 if clear:
1189 del tmp_df
FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado: 'billing.csv'
Answers
did you verify if the file billing.csv is being created by pandas_to_vertica function? does the file exist?
That's what I don't quite understand, why is it asking me if billing.csv exists, if in fact the pandas_to_vertica function ingests a dataframe directly without transforming it to csv, right? Because before when I used the function for the same reason, I didn't get that error and neither did any .csv file that was created appear in my folder
Yes.. it does create CSV in the intermediate state before loading to vertica. Please find the documentation link below
https://www.vertica.com/python/documentation_last/utilities/pandas_to_vertica/
"Ingests a pandas DataFrame into the Vertica database by creating a CSV file and then using flex tables."
It's true, I read the official documentation well there, but if so, why didn't I get that error in previous ingestions? Nor did I see that an armed csv was ever created from calling the pandas_to_vertica function with my DataFrame... I don't know what to do
Is it possible to share table DDL and output for the below query? I will try to reproduce the issue locally
select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR'
table DDL:
CREATE TABLE POCDATA.billing
(
SERVER varchar(2) NOT NULL,
billing_id numeric(10,0) NOT NULL,
patient_id varchar(100),
event_id varchar(100),
hos_id numeric(10,0),
msp_id numeric(10,0),
doc_id numeric(10,0),
heq_id numeric(10,0),
appointment_time timestamp,
action varchar(1),
transaction_time timestamp,
hos_name varchar(40),
patient_full_name varchar(80),
patient_num varchar(20),
hin_name varchar(100),
msp_name varchar(60),
doc_full_name varchar(80),
dni varchar(10)
);
ALTER TABLE POCDATA.billing ADD CONSTRAINT PK_BILLING PRIMARY KEY (SERVER, billing_id) ENABLED;
The output of the " select NVL(max(billing_id),0) LAST_BILLING_ID from POCDATA.billing where SERVER='AR' " is empty because that database does not currently have data yet
if you don't have data, why are you trying to load to vertica?
Because I have to extract data from postgreSQL databases and load them into a new Vertica database. To do this, I build a DataFrame from an SQL query to get the data I need to load, and then use VerticaPy functions, such as "pandas_to_vertica" to ingest that DataFrame directly into the Vertica database.
What happens is that the database in Vertica is empty because I haven't done any ingestion yet, precisely because of the error that I'm currently getting...
yes it is possible. choose a table which as data and try running your code. if you still get the error, please share me the information. I will review it
I've tried everything but still can't get it to work
Hi fabricio5000,
By any chance did you see the latest release of VerticaPy 0.12?
You can set up a DBLink from Vertica to Postgresql then query it in a lively manner.
Please check here: https://www.vertica.com/python/workshop/full_stack/dblink_integration/index.php
I'm sure you'll be positively amazed! well, I am
Hi @fabricio5000,
Can you please send me a reproducer so I can identify the problem?
Badr