Streaming data in one line!
Remember that game show, "Name That Tune", where contestants were challenged to name a tune in as few notes as possible? Today's tip convolutes that for Vertica Big Data, showing how we can ingest streaming data in just one line!
A simple way to stream data is to write CSV rows to a network socket. This is the method used by dump1090, a software-defined radio application for monitoring ADS-B air traffic.
But how do we capture and transform those emitted rows into queryable data? Here's a sample from dump1090:
MSG,3,1,1,ABB007,1,2019/07/23,13:45:38.487,2019/07/23,13:45:38.541,,7275,,,40.66585,-73.40684,,,0,,0,0 MSG,3,1,1,A2C5DB,1,2019/07/23,13:45:38.487,2019/07/23,13:45:38.541,,9075,,,40.70375,-73.01025,,,0,,0,0 MSG,8,1,1,AB1257,1,2019/07/23,13:45:38.489,2019/07/23,13:45:38.542,,,,,,,,,,,,0 MSG,7,1,1,A34E97,1,2019/07/23,13:45:38.493,2019/07/23,13:45:38.543,,4100,,,,,,,,,, MSG,1,1,1,A44E24,1,2019/07/23,13:45:38.497,2019/07/23,13:45:38.543,DAL1200 ,,,,,,,,,,,0
These messages are published on TCP port 30003. At the other end, we can use "netcat" (nc) to listen to TCP port 30003 on the source machine and pipe the data to Vertica:
nc -i 60 remote 30003 | split -u --filter="/opt/vertica/bin/vsql -U dbadmin -w PASS -i -c \"COPY dump1090csv FROM LOCAL STDIN DELIMITER ','\"" --lines=2000 - 1090
That is one complicated line! Let's break it out:
- Netcat (nc) connects to TCP port 30003 on "remote" and writes the stream to stdout (-i 60 times out if no data received in 60 seconds);
- Split reads the data from stdin and divides into 2000-line chunks passed to filter "vsql"
- vsql reads comma-delimited text from stdin and writes to table "dump1090csv" (we could read CSV or JSON data into a flex table also)
This doesn't have the security, replay, error handling and so on of a proper solution like Kafka, but it gets the job done! Check out below for a link into a broader discussion of ADS-B data streaming into Vertica using a Kafka implementation.
References:
COPY LOCAL data to Vertica: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Statements/COPY/COPYLOCAL.htm
Robust streaming with Kafka: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/KafkaIntegrationGuide/HowKafkaAndVerticaWorkTogether.htm
Blog series on Vertica and ADS-B air traffic monitoring: https://www.vertica.com/blog/blog-post-series-using-vertica-to-track-commercial-aircraft-in-near-real-time-part-7/
Have fun!