Using Vertica to store files
We're thinking to store extracted files in Vertica. We'll be extracting a huge number of files in Vertica and planning to store files chunks in Vertica in a table. The chunk can be stored in the BLOB data type, and each file may go into multiple rows if it can't fit in a single row. We're not sure how Vertica will handle it? Will it be bad to use Vertica in this use case?
Do you know if anyone has used Vertica in such a way?
Best Answer
-
marcothesane - Select Field - Administrator
In that case I would counsel against it. Vertica's strength is that it is a relational database platform built for huge amounts of data.
You don't filter, group by, join on, get the sum of, the average of, the standard deviation of, a pdf file, a doc file, let alone an executable. In a relational database, what you do is filter, group by, join on, get the sum of, the average of, the standard deviation of, columns of tables. You would only increase the license size dramatically for just data to store and retrieve, and then process in a front end application. Any metadata about the files, yes. The data itself, I would store it outside of Vertica. My two Swiss Francs ...5
Answers
Have you gone through Flex tables, https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/FlexTables/FlexTableHandbook.htm?tocpath=Using Flex Tables|_____0
For an appropriate answer, I think we could do with more information:
Can you elabourate what kind of files you are referring to? Textual files, like log files? Textual data files? Binary files? Unstructured files, like videos,images, audio files? Free Text?
What are you planning to do with them?
The files could be of any type exe, pdf doc, etc. We do capture network traffic and want to store a few selective mime types in the database. The number of files could be huge and also the size could be large, so we are planning to divide each file in chunks and the file identity to which that chunk belongs to, and want to store in Vertica. This might be a bad idea but I just wanted to know if Vertica handles such use cases.
Thanks for the clarification.