We store data in databases because we want to retrieve that data later on. Well, of course we do! But what happens when we want to get that data out? Do we just open up a file and all that data goodness just displays itself for us? No, it's not quite that easy, nor would that really be useful.

We need a way to ask the database for the data we want, because we are probably not going to want everything all at once and we'll probably want specific slices of data that are relative to what we are doing at that moment.

SQL is exactly what we need. SQL stands for Structured Query Language, and it's the "programming" language used by most databases to provide a way of managing that data.

That's enough of that kind of talk for now. Lets jump right in and setup a database.

Downloading a Database Server

For this article I'm going to use PostgreSQL because it's popular and free.

I'm not going to provide a walk through of installing the database, but don't fret. It's really easy! Download the right package for your operating system from the link below, then come back and I'll get started by opening up the database and populating a couple of tables.

PostgreSQL - http://www.postgresql.org/download/


Setting up the Database

When you first run PostgreSQL you should see a window that looks like the image below:

Welcome to Postgres

This window provides you with a way to access the documentation, preferences and the console. The console gives us a gateway into the database server, albeit a not very pretty gateway. Who cares about pretty though when you can get down and dirty!

The console, incase you are unable to follow along, is just a text based application.

Postgres Console

Now that I have a gateway into my database server, I'm going to create a database and a table or two with some data so that I can test out some queries.


Creating a database

Creating a database in the psql console is really easy. Just type create database <name> where <name> is the name of the database. I'm going to call mine beastly.

Create Database Partial Statement

Notice that we didn't get any type of acknowledgement back from the server. That's because I forgot to end my SQL statement with a <;>. I'll do that now so we can have the server tell us how proud it is that we sent it a valid command.

Create Database Full Statement

That's better. The server responds this time with CREATE DATABASE. At first this seems like an odd response, but really it's just the server stating the job that it just performed. Create Database is exactly what we told it to do!

I now need to tell psql that I want to use the newly created database instead of the default one that is created for me when Postgres is installed. To do that I will run the \connect command.

Postgres connect command

Notice how the prompt changed from Jeramy to beastly. This is an indicator that we are connected to a specific database. In this case, we have connected to the recently created beastly database.

Now that our new database is ready-to-go it is time to create some tables.


Creating tables

To create a table I'll need to send an SQL statement to Postgres telling it to create a table with a specific table structure. Meaning I need to tell Postgres to create a table with columns that hold certain types of data and have certain constraints.

Here is a list of some different types of data that a column can contain:

  • Integers
  • Characters
  • Dates
  • Money

The list goes on and on, and each different database system (i.e. SQL Server, Oracle Database, MySQL, etc) has different data types that you can use. For the most part they are pretty similar, but may have small differences like using Char or VarChar. For a list of the different data types available in Postgres checkout the documentation.

Back to my tables. Since my database name is beastly, I'm going to keep data on things that can be considered beastly! My first table will contain different beasts with no specific data except for the name and type of beast. I will name it beasts, and the columns will be named name and species.

The beasts table:
beasts table

The second table will hold information specific to the dwelling locations of the different types of beasts and it will be named beastlocations. The columns for this table will be type and location.

beasts location table

Taking a look at the graphical representations of my tables above, I can see that I'm really only saving one type of data, character data of variable lengths. In Postgres, a character type is represented by one of three different keywords1:

  • varchar(n) - a variable-length entry where n is the limit
  • char(n) - a fixed-length entry where n is the fixed-length. If the entry is smaller than n then it is padded with blank spaces until it is of size n
  • text - variable unlimited length entry

Now that I know what data type is needed for my columns I can go ahead and create my tables. Again, like when I created the database, I will use the create command, but this time I will specify that I am creating a table and what I want that table to look like.

CREATE TABLE beasts (  
    name    varchar(50),
    species varchar(50)
);

Table Creation

I use varchar(50) because I don't plan on any entering anything longer than fifty characters. According to the Postgres manual on data types, there isn't a performance hit for using varchar or even text, but you may possibly use more space when using those two data types over a fixed-length char. In this case, I don't think it's such a big deal.

Now that we have two finely crafted tables we need to insert some data into them.


Inserting data

Databases and tables are worthless without a way to push and pull data to and from the database. That's where the INSERT command comes to play.

In my visualization of the beasts table I had four different beasts listed that I want the database to contain. I'm going to use the insert command to put the four beasts into the database.

INSERT into beasts VALUES  
  ('Great White Shark', 'Caracharodon'),
  ('Hyena','Crocuta'),
  ('Panther','Cat'),
  ('Wolf','Canis');

Insert known order

Because this was a simple insert statement I was able to write it without specifying which columns the data needed to go in. I wrote the statement so that each entry is inserted in the order that it is sent. So 'Great White Shark' is inserted into the first column and 'Caracharodon' is inserted into the second column. If my tables were larger, or I wasn't going to fill all the columns in with data I could have specified where each piece of data was to be inserted. I will use the second table as an example of a more specific statement.

INSERT into beasts (name, species) VALUES  
  ('Great White Shark', 'Caracharodon'),
  ('Hyena','Crocuta'),
  ('Panther','Cat'),
  ('Wolf','Canis');

Insert specified columns

Notice that the outcome was exactly the same. I could have mixed the name and species column list and the input to match, and it would have worked as well.

I will skip the beastlocations table and leave it as practice for you.

Now that we have data inside our tables we need a way to extract that data. Enter the SELECT command.


Select

The SELECT command is used anytime you want to have the database collect and send data. Select is an actual query and not just a statement, and the data returned from a SELECT query is called a Result Set.

The most simple way to get data from a specific table is to use the SELECT * FROM table query. This query returns everything from the specified table. This may or may not be what you want. Let's test it out on our beasts table.

SELECT * FROM beasts;  

SELECT all

As expected, SELECT * returns everything in our database. I can even return specific columns. For example, if I wanted a list of every beast in the database, but didn't care what species they were I could query the database with the following SELECT statement:

SELECT name from beasts;  

SELECT column

What if I only wanted to know what species a Hyena is, or wanted to retrieve a list of all Canis species? For more complicated needs we have the WHERE clause.


Where Clause

A WHERE clause is a statement that we can add to our SELECT query to specify what data we want included in or even left out of the returned data.

Take for instance, if I really only wanted to know what species a hyena was. I could write my SELECT query like:

SELECT * FROM beasts  
  WHERE name = 'Hyena';

Postgres WHERE

Now we know that a Hyena is of the Crocuta species!


Joins

But wait, there's more! I have two tables in our database don't I? How can I combine the two tables and get back data that is related?

There are several types of Joins and I'm going to cover most of them, but specifically leave out CROSS joins. A cross join combines two or more tables and returns a result set of the combined tables.

Inner Join

The INNER JOIN takes two or more tables and makes a result set out of the rows that meet a specific criteria. For example, I want to know if I'm safer from Hyena's on land or in the ocean. To get that answer I will need to match the Hyena's species from the beasts table to a species in the beastlocations table and return the intersecting data.

Lets take a look at what that SELECT query will look like.

SELECT b.name, l.location  
 FROM beasts b INNER JOIN beastlocations l
 ON b.name = 'Hyena' AND b.species = l.species;

INNER JOIN

In that query I introduced several new concepts.

The first is that we can provide a variable, or alias, for table. I gave beasts a variable of b so that it could be referenced throughout the query without having to constantly reference beasts.

The second is the INNER JOIN statement. In the FROM statement I conducted an INNER JOIN of the beasts table and the beastlocations table. This tells the database that there is information in both of these tables that I want back.

The third concept introduced is the ON statement. While the INNER JOIN statement tells the database that I am combining data from two or more tables, the ON statement tells the database under what conditions to join that data.

The fourth concept, and probably the least noticed is that there can be more than one condition that the database must adhere to. In my ON statement I told the database that it must JOIN the two tables only on the rows where the beasts species is the same as the species of the Hyena.

To prove that this wasn't just a fluke, as it might well be because we have so little test data, I'm going to add another Cat to the database. I will then change the query slightly so that the database only joins the rows where the species of the beasts are cats.

INSERT into beasts VALUES  
 ('Tiger', 'Cat'),
 ('Lion', 'Cat');

INNER JOIN Cat

As predicted, I got back three different beasts which are all of the species Cat.

Outer Joins

This is where SQL begins to get a little confusing. Outer Joins tend to return result sets of rows even if there are no matches from the ON clause. Which table has all the rows included depends on the type of outer join.2

Left Outer Join

I will use the LEFT OUTER JOIN on the beastly database as an example. Before running the LEFT OUTER JOIN query I will first add a bear species, called Ursus, to the beastlocations table and a Killer Whale to the beasts table without adding them to the other tables. This will make the actions of the outer join evident when we see the result set.

INSERT into beasts VALUES ('Killer Whale', 'Orcinus');  
INSERT into beastlocations VALUES ('Ursus', 'Land');  
SELECT b.name, b.species, l.species, l.location  
 FROM beasts b LEFT OUTER JOIN beastlocations l
 ON b.species = l.species;

LEFT OUTER JOIN

Notice that the seventh row is missing a correlating species and location, and that the Ursus species did not get returned at all. This is because there was nothing in the left table for the Ursus species.

Right Outer Join

If we were to run a RIGHT OUTER JOIN instead, the Ursus species would be returned in the result set.

SELECT b.name, b.species, l.species, l.location  
 FROM beasts b RIGHT OUTER JOIN beastlocations l
 ON b.species = l.species;

RIGHT OUTER JOIN

This time the bear species, Ursus, was returned along with it's location, but the Killer Whale is nowhere to be found.

Full Outer Join

So what if I wanted to see everything from both tables to see where I'm missing data? A FULL OUTER JOIN will return a result set with all rows from both tables and null values in place of non matched rows.

SELECT b.name, b.species, l.species, l.location  
 FROM beasts b FULL OUTER JOIN beastlocations l
 ON b.species = l.species;

FULL OUTER JOIN

This time both Ursus and the Killer Whale were included in the result set.

Conclusion

Using SQL to interact with a database seemed pretty daunting to me at first, but now that I understand some of the basics of the query language it makes future interactions seem more approachable. There is a lot more to SQL than the few statements and commands that I explored in this article, but at the very least I know how to make simple requests from the database server.

References