Preventing SQL Injection with Parametrized Queries

Still using magic quotes, or using addslashes to add slashes all $_POST and $_GET variables to prevent SQL Injection? Using magic quotes is considered not a good idea, and using addslashes will also certainly drive you insane.

Parameterized statements

To protect against SQL injection, user input must not directly be embedded in SQL statements. Instead, parameterized statements must be used (preferred), or user input must be carefully escaped or filtered. With most development platforms, parameterized statements can be used that work with parameters (sometimes called placeholders or bind variables) instead of embedding user input in the statement. In many cases, the SQL statement is fixed. The user input is then assigned (bound) to a parameter.
From: Wikipedia

MySQLi / PDO-only!?

PHP has built-in support for parameterized statements using the MySQL Improved Extension (MySQLi) or PHP Data Objects. So no build-in support if you're stuck with the old MySQL extension. As I was using the old MySQL extension in most of my projects and I really wanted to use parameterized statements, I found a way to implement them myself.

As placeholders I used the question mark (?), as MySQLi also uses the same syntax. Say I want to fetch a user from the database by its field `id`, one could use the following code:

$result = mysql_query("SELECT * FROM `user` WHERE `id` = {$_GET['id']}");

As obvious to most readers, this makes you vulnerable to SQL Injection as $_GET['id'] is used without any escaping or checking. Using a parametrized query, the same query would read:

SELECT * FROM `user` WHERE `id` = ?

I implemented the function 'parametrized_query' which takes a query and multiple substitution values. The query above is executed in the code like this:

$result = parametrized_query("SELECT * FROM `user` WHERE `id` = ?",
   $_GET['id']);

sprintf and mysql_real_escape_string

How does parametrized_query work? First, I escape all the string values send to the function using mysql_real_escape_string and put them between a single apostrophe: 'string'. This makes strings safe from SQL injection.

Then, using sprintf, the parameters are injected in the query. As sprintf uses %s instead of ? for placeholders, I use str_replace to substitute ? for %s.

The constructed query then looks like this (remember, $_GET only returns strings):

SELECT * FROM `user` WHERE `id` = '1'

I could force the parameter into an integer by first casting $_GET['id'] into an int:

$result = parametrized_query("SELECT * FROM `user` WHERE `id` = ?", (int) $_GET['id']);

The query then looks like this:

SELECT * FROM `user` WHERE `id` = 1

The code

Combining everything together, you end up with the following:

function parametrized_query()
{
    $query = func_get_args();
   $query[0] = str_replace("?", "%s", $query[0]);

    for($i = 1; $i < count($query); $i++) {
        if(is_string($query[$i])) {
            $query[$i] =
               "'".mysql_real_escape_string($query[$i])."'";
        } else if(is_null($query[$i])) {
            $query[$i] = 'NULL';
        }
    }

    $sql = call_user_func_array("sprintf", $query);

    return mysql_query($sql);
}

Optimizing SELECT * FROM with mysql_field_table()

Creating your own data access layer, can be quite some work. After some time, you will tackle the basics of querying the database and parsing the results into objects. Each query only retrieves a set of data from a single table, so you push the code to production like I did.

Then, after using the code for some time you want to be able to execute slightly more complex queries (like for example JOIN or sub queries). Then that piece of neat code, once running smoothly, started to degrade very fast.

Database structure

Our sample database contains two message entries written by two different authors.

Table: user

id name
1 Bouke
2 Tim

Table: post

id userId name
1 1 This is Bouke's message
2 2 This is Tim's message

Objects

So, let's talk some code. The system contains posts written by some user. The objective is to list all the messages and the user who posted the message. The following objects are given.

class Post
{
    public $id;
    public $userId;
    public $message;

    protected $_user;

    public function setUser(User $user)
    {
        $this->_user = $user;
    }

    public function getUser()
    {
        if(!isset($this->_user)) {
            $result = mysql_query("SELECT * FROM `user`
               WHERE `id` = {$this->userId}");
            $this->setUser(mysql_fetch_object($result, "User"));
        }

        return $this->_user;
    }
}

class User
{
    public $id;
    public $name;
}

Initially: 3 queries

This one is simple and straight forward. First, you fetch all the posts. Then for each post, you fetch the person who posted it. Note, there is initially a single query for requesting all the posts, but each post results in a query for the user.

$result = mysql_query("SELECT * FROM `post`");

while($post = mysql_fetch_object($result, "Post")) {
   echo "{$post->getUser()->name}: {$post->message}", "<br/>\n";
}

Improvement with a JOIN

The solution seems simple, using a JOIN only performs a single query:

SELECT * FROM `post` JOIN `user` ON (`post`.`userId` = `user`.`id`)

But this has also its affects on the code issuing this query. The code needs to build two objects from each row it receives. Using a simple associative array will not help you here, as both tables have a column called `id`.

Aha, you might think, use an alias and case solved! But what if you are a bit lazy, like me. You would not want to specify all the columns with their aliasses, but still use SELECT *. This is where mysql_field_table() appears on the stage:

string mysql_field_table ( resource $result , int $field_offset )
Returns the name of the table that the specified field is in.

From: PHP Manual

So, first we query the database using a join:

$result = mysql_query("SELECT * FROM `post`
   JOIN `user` ON (`post`.`userId` = `user`.`id`)");

Then, we build the mapping of column index => {tablename, fieldname}

$columns = array();
for($i = 0; $i < mysql_num_fields($result); $i++) {
    $columns[] = array(
        mysql_field_table($result, $i),
        mysql_field_name($result, $i));
}

// Result:
// [["post","id"],
//  ["post","userId"],
//  ["post","message"],
//  ["user","id"],
//  ["user","name"]]

Now we know which column relates to which tablename and fieldname, we iterate over all rows and create a two-dimensional array:

$rows = array();
while($row = mysql_fetch_row($result)) {
    foreach($row as $i => $value) {
        $row[$columns[$i][0]][$columns[$i][1]] = $value;
        unset($row[$i]);
    }
    $rows[] = $row;
}

// Result:
// [{   "post":{"id":"1","userId":"1","message":"This is Bouke's message"},
//  "user":{"id":"1","name":"Bouke"}},
//  {   "post":{"id":"2","userId":"2","message":"This is Tim's message"},
//  "user":{"id":"2","name":"Tim"}}]

Then the last step concerns transforming the two-dimensional array into object:

class Factory
{
    public static function build($object, array $data)
    {
        foreach($data as $property => $value) {
            $object->$property = $value;
        }
        return $object;
    }
}

foreach($rows as $row) {
    $post = Factory::build(new Post(), $row["post"]);
    $post->setUser(Factory::build(new User(), $row["user"]));

    echo "{$post->getUser()->name}: {$post->message}", "<br/>\n";
}

And you're done; all down to a single query! You might wander, why would you rewrite four lines of code to a whoppin' 32 lines? Only for the benefit of using lazy SELECT * querying?

Well, those 32 lines decompose into 25 lines for the framework and only a stunning 7 lines for the fetching and object building. Is the method worth rewriting your code?

The method explained has not yet been taken into production, as I first want to perform an speed impact vs ease of code analysis. For now, the code provided is only a proof of concept.

Except where otherwise noted, this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.