Filtering Data While Loading into Vertica

Suppose you have a CSV file and you want to copy some, but not all, of its contents into a Vertica table. There are two ways you can do this:

• Use the SKIP keyword with COPY.
• Use the head or tail Linux command.

Let's see how this works.

The Data

Here's a small example data set, a CSV file named students.csv, that contains the following information for 20 students:

Smith, John, 1977, 1981, Bowers Hall
Brown, Tim, 1978, 1982, Blue Hall
Howe, Mary, 1976, 1980, White Hall
Kelly, Sarah, 1977, 1981, Blue Hall
Jones, William, 1977, 1981, Johnson Hall
Brady, Mark, 1976, 1980, Madison Hall
Black, Howard, 1975, 1979, Bowers Hall
King, Martha, 1976, 1980, White Hall
McCoy, Keith, 1978, 1982, Madison Hall
McDonald, Susan, 1977, 1981, Johnson Hall
Williams, Joe, 1975, 1979, Bowers Hall
Johnson, Julie, 1978, 1982, White Hall
Barry, Polly, 1976, 1980, Blue Hall
Wilson, Martin, 1975, 1979, Johnson Hall
Adams, Harry, 1976, 1980, White Hall
Ball, Terry, 1977, 1981, Bowers Hall
Chan, Ty, 1978, 1982, Johnson Hall
David, Donald, 1975, 1979, Madison Hall
Emory, Fletcher, 1976, 1980, White Hall
Flaherty, Finbarr, 1977 1981, Blue Hall

To load this data into Vertica, define a table as follows:

=> CREATE TABLE students (
      Last VARCHAR(64), 
      First VARCHAR(64), 
      EnrollYear INT, 
      GradYear INT, 
      Dorm VARCHAR(64)
);

Now you can load the data from the CSV file into the students table:

=> COPY students FROM '/home/dbadmin/students.csv' DELIMITER ',';
 Rows loaded
-------------
          20
(1 row)

=> SELECT * FROM students;
   Last   |   First   | EnrollYear | GradYear |     Dorm
----------+-----------+------------+----------+---------------
 Howe     |  Mary     |       1976 |     1980 |  White Hall
 King     |  Martha   |       1976 |     1980 |  White Hall
 McCoy    |  Keith    |       1978 |     1982 |  Madison Hall
 Williams |  Joe      |       1975 |     1979 |  Bowers Hall
 Barry    |  Polly    |       1976 |     1980 |  Blue Hall
 Emory    |  Fletcher |       1976 |     1980 |  White Hall
 Flaherty |  Finbarr  |       1977 |     1981 |  Blue Hall
 Jones    |  William  |       1977 |     1981 |  Johnson Hall
 Brady    |  Mark     |       1976 |     1980 |  Madison Hall
 Black    |  Howard   |       1975 |     1979 |  Bowers Hall
 Wilson   |  Martin   |       1975 |     1979 |  Johnson Hall
 David    |  Donald   |       1975 |     1979 |  Madison Hall
 Smith    |  John     |       1977 |     1981 |  Bowers Hall
 Brown    |  Tim      |       1978 |     1982 |  Blue Hall
 Kelly    |  Sarah    |       1977 |     1981 |  Blue Hall
 McDonald |  Susan    |       1977 |     1981 |  Johnson Hall
 Johnson  |  Julie    |       1978 |     1982 |  White Hall
 Adams    |  Harry    |       1976 |     1980 |  White Hall
 Ball     |  Terry    |       1977 |     1981 |  Bowers Hall
 Chan     |  Ty       |       1978 |     1982 |  Johnson Hall
(20 rows)

Using COPY to Filter Rows

Let's create a new table and copy the same table except for the first 10 entries. To omit the first 10 entries in the CSV file, use the SKIP keyword. We'll name the new table students_last_ten:

=> CREATE TABLE students_last_ten (
      Last VARCHAR(64), 
      First VARCHAR(64), 
      EnrollYear INT, 
      GradYear INT, 
      Dorm VARCHAR(64)
);
=> COPY students_last_ten FROM '/home/dbadmin/students.csv'
       DELIMITER ',' SKIP 10;
 Rows Loaded
-------------
          10
(1 row)
=> SELECT * FROM students_last_ten;
   Last   |   First   | EnrollYear | GradYear |     Dorm
----------+-----------+------------+----------+-------------
 Williams |  Joe      |       1975 |     1979 |  Bowers Hall
 Barry    |  Polly    |       1976 |     1980 |  Blue Hall
 Emory    |  Fletcher |       1976 |     1980 |  White Hall
 Flaherty |  Finbarr  |       1977 |     1981 |  Blue Hall
 Johnson  |  Julie    |       1978 |     1982 |  White Hall
 Adams    |  Harry    |       1976 |     1980 |  White Hall
 Ball     |  Terry    |       1977 |     1981 |  Bowers Hall
 Chan     |  Ty       |       1978 |     1982 |  Johnson Hall
 Wilson   |  Martin   |       1975 |     1979 |  Johnson Hall
 David    |  Donald   |       1975 |     1979 |  Madison Hall
(10 rows)

Using Linux to Filter Rows

You can also use the Linux head and tail commands to write certain rows from the top or the bottom of a file to a Vertica table. For this next example, we'll load the first 10 rows from the student data.

=> CREATE TABLE students_last_ten (
       Last VARCHAR(64),
       First VARCHAR(64),
       EnrollYear INT,
       GradYear INT,
       Dorm VARCHAR(64)
    );
CREATE TABLE
=> \q

Let's run the tail command to verify that it gives us the last 10 rows. Note that running the head command would give us the first 10 rows.

$ tail -10 /home/dbadmin/students.csv
Williams, Joe, 1975, 1979, Bowers Hall
Johnson, Julie, 1978, 1982, White Hall
Barry, Polly, 1976, 1980, Blue Hall
Wilson, Martin, 1975, 1979, Johnson Hall
Adams, Harry, 1976, 1980, White Hall
Ball, Terry, 1977, 1981, Bowers Hall
Chan, Ty, 1978, 1982, Johnson Hall
David, Donald, 1975, 1979, Madison Hall
Emory, Fletcher, 1976, 1980, White Hall
Flaherty, Finbarr, 1977, 1981, Blue Hall

Now we'll rerun the tail command and pipe that output into the Vertica table we created.

$ head -10 /home/dbadmin/students.csv | vsql -d vmart -U dbadmin 
       -c "COPY students_last_ten FROM LOCAL STDIN DELIMITER ',' ;"
 Rows Loaded
-------------
          10
(1 row)

Let's make sure the data was actually loaded into the student_last_ten table.

$ vsql
=> SELECT * FROM students_last_ten;
   Last   |  First   | EnrollYear | GradYear |     Dorm
----------+----------+------------+----------+---------------
 Black    |  Howard  |       1975 |     1979 |  Bowers Hall
 Brady    |  Mark    |       1976 |     1980 |  Madison Hall
 Jones    |  William |       1977 |     1981 |  Johnson Hall
 Brown    |  Tim     |       1978 |     1982 |  Blue Hall
 Kelly    |  Sarah   |       1977 |     1981 |  Blue Hall
 McDonald |  Susan   |       1977 |     1981 |  Johnson Hall
 Smith    |  John    |       1977 |     1981 |  Bowers Hall
 Howe     |  Mary    |       1976 |     1980 |  White Hall
 King     |  Martha  |       1976 |     1980 |  White Hall
 McCoy    |  Keith   |       1978 |     1982 |  Madison Hall
(10 rows)

For More Information

COPY Parameters in the Vertica Documentation
head Man Page
tail Man Page

Sign In or Register to comment.