Category: Data Storage and Retrieval

SQLite3 Database Quick Create with Bash

In this post, I will document a couple of cases where one can quickly create an SQLite3 database by issuing commands in a Bash shell (the default terminal shell in Linux and Mac OS X).

Case 1: Create a database with a table for images containing rows/tuples with the image url and a caption for the image. The images are in a directory located in resources/img/gallery, and the database will be stored in a directory at resources/sql.

Creating the database

touch resources/sql/img.db.sql

The touch command is used to change (update) the given file’s timestamp, but if the file (in this case resources/sql/img.db.sql) does not exist it will be created. Now, to make things easier to fit and read on a single command line, I’ll create a Bash variable the stores the command to access the SQLite3 database.

db='sqlite3 resources/sql/img.db.sql'

Creating and populating the img table

echo "CREATE TABLE img (url TEXT, caption TEXT);" | $db

for x in `ls resources/img/gallery`; do
echo "INSERT INTO img (url,caption) VALUES ('resources/img/gallery/$x','Insert Caption...');" | $db
done

Now the table should be created and populated with the desired content (captions to be inserted at a later time).

echo "SELECT * FROM img;" | $db
resources/img/gallery/0001.jpg|Insert Caption...
resources/img/gallery/0002.jpg|Insert Caption...
resources/img/gallery/0004.jpg|Insert Caption...
resources/img/gallery/0006.jpg|Insert Caption...
... remaining output omitted ...

More cases to come…

SQL Aggregation, Grouping, and Having

Aggregating

In SQL, the standard aggregation operators are SUM, MIN, MAX, and COUNT and we can apply them to the attributes of relations. For example we have a schema employee(eid, name, dob, hired_date, salary), which holds records on each of a company’s employees. We can COUNT and see how many employees the company has with the following query:

SELECT COUNT(*)
FROM employee;

The * is an argument to the count operator and means to count all tuples in the relation. With the DISTINCT operator we eliminate duplicates for the query result, so if we wanted to see if any of the employees had the same name we could compare the previous query with the following:

SELECT COUNT(DISTINCT name)
FROM employee;

If the first count is larger than the second, then the company has at least two employees with the same name. COUNT also does not count tuple attributes with NULL value, and AVG and SUM don’t factor them in when calculating either. We can use AVG to find the average salary of our employees and SUM to calculate the total amount paid for employee salaries with these queries:

SELECT AVG(salary)
FROM (employee);

SELECT SUM(salary)
FROM (employee);

Grouping

A grouping query is executed on a relation with the GROUP BY clause. GROUP BY is followed by an attribute list and the resulting tuples are grouped successively according to the list. To see how grouping works, let’s say that the company from our earlier examples is a shipping company and owns trucks. The company keeps track of truck purchases and their usage with the following relations: truck(tid, make, model, year, purch_year, purch_quarter, cost). In a simple example we could query the database and ask which years the company purchased trucks:

SELECT purch_year
FROM truck
GROUP BY purch_year;

GROUP BY would come after a WHERE clause, but in this case we didn’t need one so it was omitted. The result of this query would be a single attribute relation (a single column table) with values reflecting the years in which the company purchased at least one truck. If the company purchased more that one truck in that year that purchase would be aggregated with all others in that year.

Only the attributes that are listed in the GROUP BY clause can appear unaggregated in the SELECT clause. The next two queries would tell the company which years and in which quarters they purchased trucks and the total amount spent in each quarter.

SELECT purch_year, quarter
FROM truck
GROUP BY year, quarter
ORDER BY quarter, year;

SELECT purch_year, SUM(cost) as annualcost
FROM truck
GROUP BY purch_year;

The first query would return a relation of 2-tuples with the first element being purch_year and the second quarter (the result is also first sorted by quarter, then by year). The second query would also return a 2-tuple but the elements would be purch_year and annualcost which is the aggregated (by simple addition in this case) cost of trucks for each year.  Because cost is not in the GROUP BY clause, it must be aggregated.

The HAVING Clause

In our last set of queries, we asked for the costs of trucks purchased on an annual basis. Suppose for some reason the company wanted to find the average cost spent in a year only in years where every truck purchased in that year cost over $15,000. To ask this question we can use the HAVING clause.

SELECT purch_year, AVG(cost) as avgcost
FROM truck
GROUP BY purch_year
HAVING MIN(cost) > 15000;

The HAVING clause applies to each group. In this case, MIN is calculated on the attribute cost for each purchase year (purch_year) then compared to 15,000. If the minimum value for all costs in a purchase year are greater that 15,000 then the purch_year shows up in the query result (a 2-tuple (purch_year, avgcost)).