Using awk and SQLite

I have recently had the opportunity to use awk and sqlite on a project that I’m working on.

My first thought was that awk should be able to do that. I found some code that someone else did that parses quote and comma delimited.

I started off installing sqlite on cygwin.

After getting sqlite installed, I felt that I needed some test data in order to work with so I did a search and found this site:

https://www.briandunning.com/sample-data/

I grabbed the free file for testing.

wget https://www.briandunning.com/sample-data/us-500.zip

The file is comma delimited file that has these fields:

  • First Name
  • Last Name
  • Company
  • Address
  • City
  • County (where applicable)
  • State/Province (where applicable)
  • ZIP/Postal Code
  • Phone 1
  • Phone 2
  • Email
  • Web
CREATE TABLE us500 (
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
company TEXT NOT NULL,
address TEXT NOT NULL,
city TEXT NOT NULL,
county TEXT NOT NULL,
state TEXT NOT NULL,
zipcode TEXT NOT NULL,
phone1 TEXT NOT NULL,
phone2 TEXT NOT NULL,
email TEXT NOT NULL,
web TEXT NOT NULL);

Here I used the SQLite command line to issue the command to create the table.

sqlite> CREATE TABLE us500 (
   ...> first_name TEXT NOT NULL,
   ...> last_name TEXT NOT NULL,
   ...> company TEXT NOT NULL,
   ...> address TEXT NOT NULL,
   ...> city TEXT NOT NULL,
   ...> county TEXT NOT NULL,
   ...> state TEXT NOT NULL,
   ...> zipcode TEXT NOT NULL,
   ...> phone1 TEXT NOT NULL,
   ...> phone2 TEXT NOT NULL,
   ...> email TEXT NOT NULL,
   ...> web TEXT NOT NULL);
sqlite> .tables
us500
sqlite>

I attempted to import the csv file but got an error.

ssqlite> .mode csv
sqlite> .import us-500lf.csv us500
Error: us-500lf.csv line 2: expected 12 columns of data but found 13

I can tell that the records are from a Macintosh system because the file has a carriage return record delimiter.

so I wanted to view a hexdump of the file.  I used cat, tr , head and hexdump.

You can see in the HEX dump below that the file now has hex 09 line feeds.

$ cat us-500.csv | tr "\r" "\n" | head -n 3 |hexdump -C
00000000  22 66 69 72 73 74 5f 6e  61 6d 65 22 2c 22 6c 61  |"first_name","la|
00000010  73 74 5f 6e 61 6d 65 22  2c 22 63 6f 6d 70 61 6e  |st_name","compan|
00000020  79 5f 6e 61 6d 65 22 2c  22 61 64 64 72 65 73 73  |y_name","address|
00000030  22 2c 22 63 69 74 79 22  2c 22 63 6f 75 6e 74 79  |","city","county|
00000040  22 2c 22 73 74 61 74 65  22 2c 22 7a 69 70 22 2c  |","state","zip",|
00000050  22 70 68 6f 6e 65 31 22  2c 22 70 68 6f 6e 65 32  |"phone1","phone2|
00000060  22 2c 22 65 6d 61 69 6c  22 2c 22 77 65 62 22 0a  |","email","web".|
00000070  22 4a 61 6d 65 73 22 2c  22 42 75 74 74 22 2c 22  |"James","Butt","|
00000080  42 65 6e 74 6f 6e 2c 20  4a 6f 68 6e 20 42 20 4a  |Benton, John B J|
00000090  72 22 2c 22 36 36 34 39  20 4e 20 42 6c 75 65 20  |r","6649 N Blue |
000000a0  47 75 6d 20 53 74 22 2c  22 4e 65 77 20 4f 72 6c  |Gum St","New Orl|
000000b0  65 61 6e 73 22 2c 22 4f  72 6c 65 61 6e 73 22 2c  |eans","Orleans",|
000000c0  22 4c 41 22 2c 37 30 31  31 36 2c 22 35 30 34 2d  |"LA",70116,"504-|
000000d0  36 32 31 2d 38 39 32 37  22 2c 22 35 30 34 2d 38  |621-8927","504-8|
000000e0  34 35 2d 31 34 32 37 22  2c 22 6a 62 75 74 74 40  |45-1427","jbutt@|
000000f0  67 6d 61 69 6c 2e 63 6f  6d 22 2c 22 68 74 74 70  |gmail.com","http|
00000100  3a 2f 2f 77 77 77 2e 62  65 6e 74 6f 6e 6a 6f 68  |://www.bentonjoh|
00000110  6e 62 6a 72 2e 63 6f 6d  22 0a 22 4a 6f 73 65 70  |nbjr.com"."Josep|
00000120  68 69 6e 65 22 2c 22 44  61 72 61 6b 6a 79 22 2c  |hine","Darakjy",|
00000130  22 43 68 61 6e 61 79 2c  20 4a 65 66 66 72 65 79  |"Chanay, Jeffrey|
00000140  20 41 20 45 73 71 22 2c  22 34 20 42 20 42 6c 75  | A Esq","4 B Blu|
00000150  65 20 52 69 64 67 65 20  42 6c 76 64 22 2c 22 42  |e Ridge Blvd","B|
00000160  72 69 67 68 74 6f 6e 22  2c 22 4c 69 76 69 6e 67  |righton","Living|
00000170  73 74 6f 6e 22 2c 22 4d  49 22 2c 34 38 31 31 36  |ston","MI",48116|
00000180  2c 22 38 31 30 2d 32 39  32 2d 39 33 38 38 22 2c  |,"810-292-9388",|
00000190  22 38 31 30 2d 33 37 34  2d 39 38 34 30 22 2c 22  |"810-374-9840","|
000001a0  6a 6f 73 65 70 68 69 6e  65 5f 64 61 72 61 6b 6a  |josephine_darakj|
000001b0  79 40 64 61 72 61 6b 6a  79 2e 6f 72 67 22 2c 22  |y@darakjy.org","|
000001c0  68 74 74 70 3a 2f 2f 77  77 77 2e 63 68 61 6e 61  |http://www.chana|
000001d0  79 6a 65 66 66 72 65 79  61 65 73 71 2e 63 6f 6d  |yjeffreyaesq.com|
000001e0  22 0a                                             |".|
000001e2

The problem is how to handle quote comma delimited files. I did a search to see if someone already had a solution for that and found this:

AWK CSV Parser

The code contains a function called parse_csv(). You can look at the usage of the parameters on the link above. The important part is how to call this function.

{
     num_fields = parse_csv($0, csv, ",", "\"", "\"", "\n", 1);
     if (num_fields < 0) {         printf "ERROR: %s (%d) -> %s\n", csverr, num_fields, $0;
     } else {
         #printf "%s -> ", $0;
         #printf "%s", num_fields;
         for (i = 0;i < num_fields;i++) {
           fld = csv[i];
           gsub(/\"/,"",fld);
           if(i < num_fields-1)
              printf "%s\t", fld;
           else
              printf "%s\n", fld;
         }
     }
 }
cat us-500.csv |tr "\r" "\n" |awk -f  parsecsv.awk >test1.pip

I’m using AWK to convert the CSV file into a pipe delimited file.

cat test1.pip| head -n 3 |hexdump -C
00000000  66 69 72 73 74 5f 6e 61  6d 65 09 6c 61 73 74 5f  |first_name.last_|
00000010  6e 61 6d 65 09 63 6f 6d  70 61 6e 79 5f 6e 61 6d  |name.company_nam|
00000020  65 09 61 64 64 72 65 73  73 09 63 69 74 79 09 63  |e.address.city.c|
00000030  6f 75 6e 74 79 09 73 74  61 74 65 09 7a 69 70 09  |ounty.state.zip.|
00000040  70 68 6f 6e 65 31 09 70  68 6f 6e 65 32 09 65 6d  |phone1.phone2.em|
00000050  61 69 6c 09 77 65 62 0a  4a 61 6d 65 73 09 42 75  |ail.web.James.Bu|
00000060  74 74 09 42 65 6e 74 6f  6e 2c 20 4a 6f 68 6e 20  |tt.Benton, John |
00000070  42 20 4a 72 09 36 36 34  39 20 4e 20 42 6c 75 65  |B Jr.6649 N Blue|
00000080  20 47 75 6d 20 53 74 09  4e 65 77 20 4f 72 6c 65  | Gum St.New Orle|
00000090  61 6e 73 09 4f 72 6c 65  61 6e 73 09 4c 41 09 37  |ans.Orleans.LA.7|
000000a0  30 31 31 36 09 35 30 34  2d 36 32 31 2d 38 39 32  |0116.504-621-892|
000000b0  37 09 35 30 34 2d 38 34  35 2d 31 34 32 37 09 6a  |7.504-845-1427.j|
000000c0  62 75 74 74 40 67 6d 61  69 6c 2e 63 6f 6d 09 68  |butt@gmail.com.h|
000000d0  74 74 70 3a 2f 2f 77 77  77 2e 62 65 6e 74 6f 6e  |ttp://www.benton|
000000e0  6a 6f 68 6e 62 6a 72 2e  63 6f 6d 0a 4a 6f 73 65  |johnbjr.com.Jose|
000000f0  70 68 69 6e 65 09 44 61  72 61 6b 6a 79 09 43 68  |phine.Darakjy.Ch|
00000100  61 6e 61 79 2c 20 4a 65  66 66 72 65 79 20 41 20  |anay, Jeffrey A |
00000110  45 73 71 09 34 20 42 20  42 6c 75 65 20 52 69 64  |Esq.4 B Blue Rid|
00000120  67 65 20 42 6c 76 64 09  42 72 69 67 68 74 6f 6e  |ge Blvd.Brighton|
00000130  09 4c 69 76 69 6e 67 73  74 6f 6e 09 4d 49 09 34  |.Livingston.MI.4|
00000140  38 31 31 36 09 38 31 30  2d 32 39 32 2d 39 33 38  |8116.810-292-938|
00000150  38 09 38 31 30 2d 33 37  34 2d 39 38 34 30 09 6a  |8.810-374-9840.j|
00000160  6f 73 65 70 68 69 6e 65  5f 64 61 72 61 6b 6a 79  |osephine_darakjy|
00000170  40 64 61 72 61 6b 6a 79  2e 6f 72 67 09 68 74 74  |@darakjy.org.htt|
00000180  70 3a 2f 2f 77 77 77 2e  63 68 61 6e 61 79 6a 65  |p://www.chanayje|
00000190  66 66 72 65 79 61 65 73  71 2e 63 6f 6d 0a        |ffreyaesq.com.|

I then imported the data into the database.

sqlite> select first_name,last_name from us500 limit 5;
first_name|last_name
James|Butt
Josephine|Darakjy
Art|Venere
Lenna|Paprocki

I’m seeing the first row has the column titles. I’ll look into how to import without that line later.

2 thoughts on “Using awk and SQLite”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.