Options

Using Vertica to store files

rajatpaliwal86rajatpaliwal86 Vertica Customer

We're thinking to store extracted files in Vertica. We'll be extracting a huge number of files in Vertica and planning to store files chunks in Vertica in a table. The chunk can be stored in the BLOB data type, and each file may go into multiple rows if it can't fit in a single row. We're not sure how Vertica will handle it? Will it be bad to use Vertica in this use case?
Do you know if anyone has used Vertica in such a way?

Best Answer

Answers

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    For an appropriate answer, I think we could do with more information:

    Can you elabourate what kind of files you are referring to? Textual files, like log files? Textual data files? Binary files? Unstructured files, like videos,images, audio files? Free Text?

    What are you planning to do with them?

  • Options
    rajatpaliwal86rajatpaliwal86 Vertica Customer

    @marcothesane said:
    For an appropriate answer, I think we could do with more information:

    Can you elabourate what kind of files you are referring to? Textual files, like log files? Textual data files? Binary files? Unstructured files, like videos,images, audio files? Free Text?

    What are you planning to do with them?

    The files could be of any type exe, pdf doc, etc. We do capture network traffic and want to store a few selective mime types in the database. The number of files could be huge and also the size could be large, so we are planning to divide each file in chunks and the file identity to which that chunk belongs to, and want to store in Vertica. This might be a bad idea but I just wanted to know if Vertica handles such use cases.

  • Options
    rajatpaliwal86rajatpaliwal86 Vertica Customer

    @marcothesane said:
    In that case I would counsel against it. Vertica's strength is that it is a relational database platform built for huge amounts of data.
    You don't filter, group by, join on, get the sum of, the average of, the standard deviation of, a pdf file, a doc file, let alone an executable. In a relational database, what you do is filter, group by, join on, get the sum of, the average of, the standard deviation of, columns of tables. You would only increase the license size dramatically for just data to store and retrieve, and then process in a front end application. Any metadata about the files, yes. The data itself, I would store it outside of Vertica. My two Swiss Francs

    Thanks for the clarification.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file