is there any difference in terms of performance regarding Insert into and Insert /*+direct*/ into

insert into dws.test select * from dws.test_src limit 2000000                                      ---> 36s
insert /*+direct*/ into dws.test select * from dws.test_src limit 2000000                     ---> 44s

per the document, I expect to some performance improvement by using the /*+direct*/ keyword, but apparently not in my test case, any suggestion?

Comments

  • Navin_CNavin_C Vertica Customer
    Hi XiangLi,

    It all depends on your data.
    Direct keyword uses ROS (Disk) for data storage while loading
    Without Direct keyword the insert statement uses WOS (Memory) while loading data.

    If Rows loaded is less and kind of trickle loading, then you are good to use insert statements without DIRECT keyword.

    If Rows loaded are more and its bulk loading (twice or thrice a day) then use Direct keyword, as it will minimize the background task time consumption , since there will be no moveout required.

    Hope this helps.
    NC

  • yes, using direct keyword does not require moveout operation, but it seems that it is still slower when comparing with the one without it. why does that happen? does it relate to the amount/size of the data? my one is around 20m data rows, less than 10gb in size in flatfile with utf-8 encoding.
  • One possible reason would be that while using insert without direct, the data gets stored in WOS as it is, it is neither sorted, encoded/compressed etc. But if you use DIRECT with insert, then that data needs to be encoded/compressed and sorted before moving to ROS.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file