Lubos Rendek

The only solution is determination.

GNU-R MySQL Database Connection Examples With RMySQL Interface

| Comments

Here are few tips on how to get started with GNU-R and MySQL database. Before with can start using MySQL database with GNU-R first we need to make sure that we have a proper MySQL database interface installed. On Debian Linux this is easy as:

1
# apt-get install r-cran-rmysql

The above will install RMySQL nad MySQL database interface for GNU-R. Start R and confirm the installation:

1
2
3
4
> require("RMySQL")
Loading required package: RMySQL
Loading required package: DBI
> 

In the next step we need to create a database handle. Our database has name “mydata”, is located on “localhost” and the access to this database is given to user “lubos” with password “pass”.

1
dbh <- dbConnect(MySQL(), user="lubos", password="pass", dbname="mydata", host="localhost")

Our sample database contains only a single table “cost” with 3 columns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mysql> select * from cost;
+----+---------+---------+---------+
| id | column1 | column2 | column3 |
+----+---------+---------+---------+
|  1 |      54 |      68 |      74 |
|  2 |      54 |      67 |      77 |
|  3 |      54 |      67 |      78 |
|  4 |      54 |      67 |      76 |
|  5 |      54 |      67 |      79 |
|  6 |      54 |      67 |      74 |
|  7 |      54 |      67 |      75 |
|  8 |      54 |      67 |      68 |
|  9 |      54 |      67 |      63 |
| 10 |      54 |      65 |      63 |
+----+---------+---------+---------+
10 rows in set (0.01 sec)

To get more information about our database using GNU-R and RMySQL we can can use our previously created database handle “dbh”. First we can list tables:

1
2
> dbListTables(dbh)
[1] "cost"

Next we can also list all fields within our table:

1
2
> dbListFields(dbh, "cost")
[1] "id"      "column1" "column2" "column3"

Now that we know how our database looks like we can start to run some SQL queries. Let’s fetch all data from our “cost” table:

1
2
3
4
5
6
7
8
9
10
11
12
13
dbGetQuery(dbh, "SELECT * FROM cost;")
   id column1 column2 column3
1   1      54      68      74
2   2      54      67      77
3   3      54      67      78
4   4      54      67      76
5   5      54      67      79
6   6      54      67      74
7   7      54      67      75
8   8      54      67      68
9   9      54      67      63
10 10      54      65      63
>

In the next example we are going to make a sum of all values in column2:

1
2
> sum(dbGetQuery(dbh, "SELECT column2 FROM cost;"))
[1] 669

The above is the same as storing results in a matrix and perform sum only on column2:

1
2
3
4
5
6
> out = dbGetQuery(dbh, "SELECT * FROM cost;")
> sum(out$column2)
[1] 669
> sum(out$column3)
[1] 727
>

For a small data set like the one above dbGetQuery function is satisfactory. However, for a large data sets we may want to use dbSendQuery function. In contrast to dbGetQuery the dbSendQuery allows us to retrieve data in batches. Here is how dbSendQuery works. First we send mysql query:

1
> out = dbSendQuery(dbh, "SELECT * FROM cost;")

Now retrieve three separate batches of information. batchOne and batchTwo will hold first 6 results and batchThree we store rest of the results ( -1 ).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
> batchOne = fetch(out, n = 3)
> batchTwo = fetch(out, n = 3)
> batchThree = fetch(out, n = -1)
> batchOne
  id column1 column2 column3
1  1      54      68      74
2  2      54      67      77
3  3      54      67      78
> batchTwo
  id column1 column2 column3
4  4      54      67      76
5  5      54      67      79
6  6      54      67      74
> batchThree
   id column1 column2 column3
7   7      54      67      75
8   8      54      67      68
9   9      54      67      63
10 10      54      65      63
> 

To see how many rows have been fetched we can use dbGetRowCount function:

1
2
> dbGetRowCount(out)
[1] 10

To know whether query has completed use dbHasCompleted:

1
2
> dbHasCompleted(out)
[1] TRUE

In case that we no longer require our “out” results we can clear them with:

1
2
> dbClearResult(out)
[1] TRUE

Once we have finished with MySQL database we can disconnect from it using:

1
2
> dbDisconnect(dbh)
[1] TRUE

Using GNU R in scripts

If from some reason you need to run the above as a GNU R script in your shell you may want to install “little” the GNU R scripting front-end. On Linux the installation is simple as:

1
# apt-get install littler

At the point we are ready to put the above MySQL and GNU R lines into a script. Create a new file with some arbitrary name like gnurscript.R and include a following content:

1
2
3
4
5
#!/usr/bin/env r

require("RMySQL")
dbh <- dbConnect(MySQL(), user="lubos", password="pass", dbname="mydata", host="localhost")
print(dbGetQuery(dbh, "SELECT * FROM cost;"))

The first line will set your interpreter to GNU R using littler scripting front-end. Apart of the print() function the rest of the GNU R code is same as above. Make the script executable:

1
$ chmod +x gnurscript.R 

and finally run your script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ ./gnurscript.R
Loading required package: RMySQL
Loading required package: methods
Loading required package: DBI
Loading required package: utils
   id column1 column2 column3
1   1      54      68      74
2   2      54      67      77
3   3      54      67      78
4   4      54      67      76
5   5      54      67      79
6   6      54      67      74
7   7      54      67      75
8   8      54      67      68
9   9      54      67      63
10 10      54      65      63

If you wish to suppress the messages upon loading GNU R libraries, packages etc. simply change the “require” line to:

1
suppressMessages(require("RMySQL"))

Comments