Login Register

CB1, Inc.

Syndicate content
CB1, INC. is a Minneapolis based software and consulting company specializing in custom development and systems integration.
Updated: 4 min 38 sec ago

High Availability with DRBD and Heartbeat Presentation

Tue, 06/10/2008 - 16:51

Here's my presentation I gave June 9, 2008, at the Twin Cities MySQL and PHP User Group about my highly available cluster using DRBD and Heartbeat.


I added a few slides and cleaned things up a bit. The presentation went well and we had a lot of good questions.

The MySQL and PHP User Group will be taking some time off over the summer. There will be another meetup mid-summer to come up with some ideas for future meetings.

Installing DRBD 8.2.5 on Ubuntu 8.04 Hardy Heron

Sun, 05/18/2008 - 17:24
Updated! 6/6/2008

I posted the amd64 version of the compiled DRBD 8.2.5 driver at the end of this post!

DRBD is a block device driver for Linux that allows you to mirror a partition between two servers.

I had a single application server, but whenever a server failure occurred, my websites, Subversion repository, and e-mail would go down. In order to be highly available, I added a second server in the event of a failure.

app1.cb1inc.com app2.cb1inc.com AMD Opteron 180 2.4GHz dual-core
4GB RAM DDR
2 x 250GB SATA 7,200 hard drives (Software RAID 1)
2 x Gigabit network cards AMD Sempron 2800+
3GB RAM DDR
250GB IDE 7,200 hard drive
2 x Gigabit network cards

I don't have a ton of load, so the second server doesn't have to be super powerful.

Each machine has 2 network cards: one for public and one for private traffic. The public interfaces connect to my main switch on the 192.168.0.X network. The private interfaces are connected via a crossover cable on the 10.26.210.X network.

Once you have installed Ubuntu 8.04 Server, you need to install some build tools. I don't know specifically which build packages you need, so I just install a bunch of them and it should work. :)

apt-get install build-essential binutils cpp gcc autoconf automake1.9 libtool \
autotools-dev g++ make flex

In theory, that should get all of the development tools installed that you'll need.

Next we need to get the entire kernel source code. Just the kernel headers won't cut it. We need to compile the kernel so that it builds some of the scripts needed to compile the DRBD driver. We'll also install the ncurses library so that menuconfig works.

apt-get install libncurses5-dev linux-source-2.6.24

Then extract the kernel source:

cd /usr/src
tar -xvf linux-source-2.6.24.tar.bz2
cd /usr/src/linux-source-2.6.24

Next, lets clean up any unneeded files (which there shouldn't be any the first time):

make mrproper

Before you can build the kernel, you need to copy your existing kernel build configuration into the kernel source directory:

cp /boot/config-2.6.24-16-server /usr/src/linux-source-2.6.24/.config

Now we run the menuconfig which will read in our kernel build configuration and build some version files. As soon as the GUI appears, just exit immediately. You don't have to change any of the settings.

make menuconfig

Finally we need to prepare the kernel and compile it. This will take quite some time.

make prepare
make

Now that we have the kernel source compiled and ready to go, let's get the DRBD source.

cd /root
wget http://oss.linbit.com/drbd/8.2/drbd-8.2.5.tar.gz
tar -xvf drbd-8.2.5.tar.gz
cd /root/drbd-8.2.5

We need to build the DRBD driver and specify the path to the kernel source, then install the driver in the /lib path:

make KDIR=/usr/src/linux-source-2.6.24
make install

Once the driver is compiled, we need to move/copy it to the appropriate lib directory:

mv /lib/modules/2.6.24.3/kernel/drivers/block/drbd.ko \
    /lib/modules/2.6.24-16-server/kernel/drivers/block

Next we need to start the driver and tell Linux to load it the next time it boots:

modprobe drbd
echo 'drbd' >> /etc/modules
update-rc.d drbd defaults

Now that everything is installed, verify the driver is loaded:

lsmod | grep drbd

It might be a good idea to reboot and make sure it loads.

At this point, you should set up the /etc/drbd.conf, which mine looks like this:

global {
  usage-count no;
}

common {
  protocol C;

  syncer {
    rate 30M;
    al-extents 1801;
  }

  startup {
    wfc-timeout  0;
    degr-wfc-timeout 15;
  }

  disk {

    on-io-error   detach;
    # fencing resource-and-stonith;
  }

  net {
    sndbuf-size 512k;
    timeout        60;   #  6 seconds  (unit = 0.1 seconds)
    connect-int    10;   # 10 seconds  (unit = 1 second)
    ping-int       10;   # 10 seconds  (unit = 1 second)
    ping-timeout   5;    # 500 ms (unit = 0.1 seconds)
    max-buffers    8000;
    max-epoch-size 8000;
    cram-hmac-alg  "sha1";
    shared-secret  "secret";
  }
}

resource r0 {
  on app1 {
    disk       /dev/md2;
    address    10.26.210.10:7788;
    device     /dev/drbd0;
    meta-disk  internal;
  }

  on app2 {
    disk      /dev/sda3;
    address   10.26.210.11:7788;
    device     /dev/drbd0;
    meta-disk  internal;
  }
}

Notice the disk is different for each machine. The first machine is a software raid (md) while the second is a single drive (sda). Your setup will most likely be /dev/sdaX if you are using SATA or a RAID card, but it could be /dev/hdaX. You can use cfdisk to quickly check your partitions.

With the configuration set, you need to restart/reload DRBD, create the meta disk, and bring the drive up:

/etc/init.d/drbd restart
drbdadm create-md r0
drbdadm up r0

At this point you can view the DRBD status:

chris@app1:~$ cat /proc/drbd
version: 8.0.11 (api:86/proto:86)
GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by phil@mescal, 2008-02-12
 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:221871972 nr:7160 dw:3856764 dr:227396211 al:763 bm:17504 lo:0 pe:0 ua:0 ap:0
	resync: used:0/31 hits:13841287 misses:13628 starving:0 dirty:0 changed:13628
	act_log: used:0/1801 hits:961638 misses:790 starving:0 dirty:27 changed:763

chris@app1:~$ /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.0.11 (api:86/proto:86)
GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by phil@mescal, 2008-02-12
m:res  cs         st                 ds                 p  mounted    fstype
0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /mnt/data  ext3

You're now ready to mount /dev/drbd0 and put data on it. There's still a couple other things you need to do if you plan on using Heartbeat for failover monitoring.

Download the Driver

If you are running an amd64 architecture, you can download the already compiled driver that was built with the steps above. Just to be clear, this was compiled for Ubuntu 8.04 with the 2.6.24-16-server kernel.

Just put the file in the drivers folder and don't forget to set the proper owner.

wget http://www.cb1inc.com/sites/default/files/drbd.ko
chown root:root drbd.ko
mv drbd.ko /lib/modules/2.6.24-16-server/kernel/drivers/block

I hope this is of some help and good luck!

2008 MySQL Conference Recap Presentation

Tue, 05/13/2008 - 21:17

Here's my presentation I gave May 12, 2008, at the Twin Cities MySQL and PHP User Group about my experience at the 2008 MySQL Conference and Expo.


Thanks to all of those that came. I had a great time!

Memcached and MySQL Presentation

Sat, 05/10/2008 - 14:16

Here's my presentation on Memcached and MySQL:


You can download the sample files here:

MinneBar 2008 This Weekend!

Thu, 05/08/2008 - 04:12

This Saturday, May 10th, is MinneBar, Minnesota's BarCamp. MinneBar is described as an "(un)Conference" which means it's a free, ad-hoc gathering of technology folks where everyone is encouraged to contribute.

MinneBar

There are a lot of great sessions this year. I'll be giving a presentation titled "Memcached & MySQL Sitting in a Tree." The talk is about the new Memcached Functions for MySQL. I'll talk a bit about the what, why, and how about this set of awesome UDFs.

I'm not sure what time I present and I think I have 50 minutes, but I don't know for sure. I'm trying something new this time around; I'll be publishing my presentation on SlideShare.

We are still 3 days away and there are currently 356 people signed up which is right around how many people were signed up last year. If you are in the Minneapolis/St. Paul area, you should come to participate and learn!

To register, visit their website, click the "login" link in the top right, use the password "c4mp" to login, then edit the main page, and add yourself to the bottom. Registration starts at 8:00am, so remember to set an alarm. :)

Hope to see you there!

Meet the RotatorContainer

Fri, 04/25/2008 - 23:27

I've been sitting on this code for a while and decided to clean it up and submit it to the Dojo Javascript Toolkit.

The RotatorContainer cycles through dijit.layout.ContentPanes and provides navigation in the form of tabs or a pager. There a number of timing settings you can adjust as well as the layout of the controls.

Give it a try! You can also see it in action on the front page of http://www.cb1inc.com.

I submitted this widget, so it may or may not be accepted. In the meantime, you can download the RotatorContainer here: RotatorContainer.tar.gz [9KB] To install it, extract it into your dijit's parent directory.

Here's how you can use it:

<script type="text/javascript">
dojo.require("dojo.parser");
dojo.require("dijit.layout.ContentPane");
dojo.require("dojox.layout.RotatorContainer");
</script>

<div dojoType="dojox.layout.RotatorContainer" id="myRotator" showTabs="true"
      autoStart="true" transitionDelay="5000">
    <div dojoType="dijit.layout.ContentPane" title="1">
        Pane 1!
    </div>
    <div dojoType="dijit.layout.ContentPane" title="2">
        Pane 2!
    </div>
    <div dojoType="dijit.layout.ContentPane" title="3">
        Pane 3!
    </div>
</div>

NOTE: There is some CSS needed to make the RotatorContainer look correct. Include or copy the contents of the dijit/layout/resources/RotatorContainer.css into your CSS file.

The magic happens once you have some content. It takes a bit of time to tweak things, but you can pretty much do anything you can imagine.

The RotatorContainer can be controlled by a pager which includes a play/pause button, next and previous button, and the current and total panes. You can also have as many pagers as you'd like. The pager code looks like this:

<script type="text/javascript">
dojo.require("dijit.form.Button");
</script>

<div dojoType="dojox.layout.RotatorPager" rotatorId="myRotator">
    <button dojoType="dijit.form.Button" dojoAttachPoint="previous">Prev</button>
    <button dojoType="dijit.form.ToggleButton" dojoAttachPoint="playPause"></button>
    <button dojoType="dijit.form.Button" dojoAttachPoint="next">Next</button>
    <span dojoAttachPoint="current"></span> / <span dojoAttachPoint="total"></span>
</div>

One thing I should note, if this widget should find it's way into dojox, it's possible the name or functionality will change.

If time permitted, I would have liked to add additional transitions such as a left-to-right wipe. Oh well.

Enjoy!

mod_dbd MySQL Driver Woes With Ubuntu 7.04

Tue, 04/22/2008 - 16:58

Apache has a neat module called mod_dbd that allows your Apache modules to connect to a database. mod_dbd interfaces with apr_dbd, an Apache Portable Runtime (APR) abstraction layer around database specific drivers.

Back when Ubuntu 7.04 (fiesty) was released, a MySQL driver was not bundled with Apache for licensing concerns. So, in order to use mod_dbd to connect to a MySQL database, you need to get the MySQL driver source code from WebThing (apr_dbd_mysql.c) and manually re-compile apr-utils.

You also need the source code for Apache 2.2.3 (which includes apr-utils 1.2.7) from the Ubuntu 7.04 repositories, then copy the apr_dbd_mysql.c file into the Apache source apr-utils/dbd directory. The Ubuntu guys made a nice INSTALL.MySQL file in the apr-utils with some basic instructions.

What they don't tell you is you need to install the MySQL source. To make matters worse, once you install it, the apr-utils 1.2.7 configure script can't find it, even if you tell it where it is.

<snip>
configure: checking for mysql in /usr/src/mysql-dfsg-5.0-5.0.38/include
checking mysql.h usability... no
checking mysql.h presence... no
checking for mysql.h... no
<snip>

This apparently was a known issue and was fixed in apr-utils 1.2.8.

Starting with apr-utils 1.2.11, the MySQL driver is bundled with it. Unfortunately, even Ubuntu 7.10 (gutsy) still ships with apr-utils 1.2.7. So, you are forced to download the source and compile.

Or, you can wait a couple days and Ubuntu 8.04 (hardy) which has Apache 2.2.8 and apr-utils 1.2.11. In theory the MySQL driver will work out of the box.

As for me, I'll be compiling Apache, PHP, MySQL, memcached, and <insert essential infrastructure software> from source like I should have done in the beginning.

MySQL Conference Day 4 Thoughts

Thu, 04/17/2008 - 18:59
Scaling out MySQL: Hardware Today and Tomorrow

Jeremy Cole and Eric Bergen over at Proven Scaling LLC gave a talk about the hardware side of MySQL. They covered pretty much every aspect of hardware.

For starters, Jeremy said go 64-bit hardware and operating system. For CPU, faster is better. The current versions of MySQL and InnoDB don't take full advantage of 8 core servers, so unless you have the budget, Jeremy recommended a single quad-core or a dual dual-core setup. He recommended getting as much RAM as possible. RAM is cheap so go for 32GB, or at least 16GB.

For storage, Jeremy discussed the many options including direct attached storage (DAS), SAN, NAS, and the various hard drive interfaces. From what I gathered, they prefer configuring each DB server with RAID 10. If the RAID controller has battery backed cache, then you should do "write back", otherwise "write through". Write back offers faster performance since it caches the data and doesn't make the system wait for the data to be written to disk. The battery backed cache means that you won't lose the data pending to be written if the system loses power. There was a brief discussion of SATA vs. SAS. SAS offers faster drives (15,000 RPM) and have processors to handle commands just as SCSI has which improves performance. SAS has another interesting feature where a single drive can be hooked up to two separate SAS controllers in the event one controller should become unavailable.

They buy all of their gear from Dell, but HP, Sun, and IBM are good too. Dell just happens to be significantly cheaper, especially when you go through a sales rep. They mentioned some of the smaller guys including SuperMicro and Silicon Mechanics. I personally really like SuperMicro's 6015T server because it has 2 nodes in a 1U chassis. This is actually denser than any blade server solution I've ever seen. Each node is capable of two quad-core processors and 32GB of RAM. The only downside is you can only have 2 hard drives and both nodes share a single non-redundant power supply. So this would make a decent slave, but you would need to architect your application so it could quickly pick another slave if/when it goes down or use MySQL Proxy.

For databases using InnoDB, they said the InnoDB buffer pools should be 2GB less than to total system memory, so 14GB on a 16GB system. Jeremy mentioned special hardware to speed things up, specifically Kickfire and Violin Memory. Kickfire is a SQL appliance that includes a special SQL chip to speed up operations significantly. Violin Memory's 1010 memory appliance is sweet. For only $170k you can add 512GB of DRAM in 2U to your database server of a PCI-Express bus. It holds 84 x 6GB chips that can be hot swapped. You can lose 2 sticks before you're screwed.

Jeremy concluded with high-speed interconnects including InfiniBand and Dolphin Interconnect. InfiniBand is fast and you can hook them all into a switch. Dolphin's interconnect is also fast and are chained together in a loop similar to external SCSI devices, but you need to make sure they have a driver for your hardware.

I talked to Jeremy after his talk and asked him about diskless slaves which would basically have a RAM drive for the data. While it would be fast, it would take memory that would otherwise be used by MySQL and would be a pain to manage when they come online. So scratch that idea.

Helping InnoDB Scale on Servers with Many CPU Cores and Disks

One of the more popular talks was by Mark Callaghan at Google who talked about ways they managed to get InnoDB to take advantage of system with more than 4 cores and many disks. The primary change they made was to InnoDB's mutex code used to control concurrent read/writes to pages.

They replaced the existing pthreads mutex code with a more efficient platform specific compare and swap CPU instruction (CAS). They managed to get much better performance. He said they are hoping to get a patch out by the end of the year with their changes. They don't want to release it until they know it is rock solid.

Scaling Heavy Concurrent Writes In Real Time

Dathan Pattishall, formerly with Flickr, and now with RockYou.com talked about an analytics system he helped build for Flickr. Flickr keeps track of each photos stats including external links. Whenever someone directly embeds a picture from a Flickr Pro user, they record that information, then make those stats available in near realtime.

The old design basically involved inserting records as they came in, but it was killing the servers, especially since those servers were also handling reads for people viewing the stats. Their solution was to create a separate Java daemon that queues up pending inserts. This means only a single thread is used on the MySQL server and it doesn't block the web servers from serving up the information.

They are inserting the stats into 3 tables, one for daily, weekly, and monthly stats. To keep things in order, they tried a VARCHAR of the URL as the primary key, but ran into major performance issues. So instead they decided to store a hash as bigint:

// php
$id = hexdec(substr(md5(url),0,16),16,10);

This code generates a 32 character MD5 of the URL, then takes the first 16 characters and converts them from string of hex numbers to base 10 number. The resulting number fits perfectly in a bigint.

He also mentioned using ibbackup for backing up the databases, but it is not a free solution.

Geo Distance Search with MySQL

Ever since Google Maps API was released, I've had an interest in playing around with it. Alexander Rubin of MySQL talked about ways of querying for locations within a given distance of a lat/lng. He first abstracts the distance math into a user-defined function (UDF). Then just calls the UDF from within the query.

I've already played with geo searching before, so it was mostly review. He didn't go into much depth such as MySQL's spatial extensions.

Dinner at the Tied House

We had a great turnout of around 18 people at the Tied House in Mountain View. We had a number of people from places including MySQL, PrimeBase, and Facebook.

Thanks to the PrimeBase guys! They have a neat transaction storage engine that supports streaming blob data. Normal blobs in MySQL are held in memory during the transaction. The PBXT Storage Engine is designed to stream blob data in and out very efficiently.

I'd like to give a special thanks to Jay Pipes for getting me to come to the conference this year. I truly had a great time. Thanks!

MySQL Conference Day 3 Thoughts

Wed, 04/16/2008 - 17:54
Keynotes

The conference committee managed to get Rick Falkvinge of the Swedish Pirate Party to speak. I heard him speak at OSCON 2007. What I took away from his talk is copyright is evil. Copyright is the excuse industries (i.e. the music industry) are using as a tool to justify monitoring all of your communications. Not only do they want to monitor you, but prohibit certain kinds of communications. What it comes down to is your privacy vs. copyright. It's scary stuff.

The second part of the keynote was a panel consisting of a representative from MySQL, Sun, flickr, FotoLog, Wikipedia, Facebook, and YouTube. They were discussing scaling at each of their sites. It was a great discussion. Informative and funny. Paul Tuckfield of YouTube had a great saying: "Replication is the answer. You just need to rephrase the question." Farhan “Frank” Mashraqi of FotoLog made an interesting observation where Sun Sparc Niagara 1 servers make great master servers due to their high speed and Sun Sparc Niagara 2 servers make great slave servers due to their large concurrency.

Grand Tour of the information_schema

The information_schema database is a built-in database that contains metadata about data including tables, partitions, privileges, character sets, constraints, indexes, server settings, server status, and routines. This database is an alternative to MySQL's proprietary SHOW commands.

I see a real utility being able to query the information_schema database to check server status. Another interesting use is to auto-generate schema documentation. I'm curious what kind of user metadata you can associate to objects.

Applied Partitioning and Scaling Your Database System

Phil Hildebrand gave his talk about the different ways of partitioning your data. The types are range, hash, key, and list. You can read more about partitioning types in MySQL's documentation.

MySQL Performance Under a Microscope: The Tobias and Jay Show

This was an entertaining talk by MySQL's Tobias Asplund and Jay Pipes. They showed the results of a few benchmarks comparing multiple ways to do something.

In the first test, they wanted to see what was the fastest way for getting the total count of records. They tried a handful of ways, but COUNT(*) when query caching was enabled was the fastest.

On of the other interesting tests they did was DATETIME vs. INT UNSIGNED for storing a date. The best method was to use an INT UNSIGNED and do the date to int conversion on the application tier. In PHP, use the strtotime() function.

The MySQL Query Cache

Query caching can bring huge performance gains to your web application. Baron Schwartz of Percona gave a talk describing why query cache rocks.

MySQL caches query results, not execution plans. It stores the results in a big hash table where the key is the query. They key is case-sensitive and whitespace-sensitive. Only SELECT statement results are cached since it doesn't make a whole lot of sense to cache INSERT or UPDATE results. Only deterministic queries are cached. If the query contains a non-deterministic function call, such as a function that returns the current time, then it cannot cache the results.

You can display the query cache information by executing the following:

SHOW GLOBAL STATUS LIKE 'qcache%';
SHOW GLOBAL STATUS LIKE 'query_cache%';

The way the query cache memory is allocated can potentially cause fragmentation. You can get a feel for how bad it is by comparing the number of free blocks to the number of total blocks. If you are running out of free blocks, you either have filled your cache or you have bad fragmentation.

Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL

The talk about Grazr was given by Patrick Galbraith and Michael Kowalchik. Patrick is one of the fellows that showed of some awesome memcached stuff at tutorial and the BOF. Grazr filters out feeds to only the information it thinks you'd be interested in. This was a pretty general discussion and they managed to get through their slides pretty quickly. Since the talk was winding down early, I headed over to Eli's talk.

Help, My Website has been Hacked! Now What?

If you have a popular site, you are going get people attempt to hack your site. Eli White of digg talked about some of the ways your site can get hacked.

One thing he pointed out that I didn't think about was you can't just block someone's IP address. If there is a proxy between the user and the web server, then IP address you get is the proxy's, not the user's. You need to check the x-forwarded-for HTTP header. If there are more than one proxies involved, the x-forwarded-for will contain a comma separated list of addresses.

I talked to Eli after his session and he recommended blocking the IPs on the firewall instead of the PHP code. This is means less load on the app server, but unless you have a fancy firewall, I would be curious to know how often a particular IP is trying to attack me.

Performing MySQL Backups Using LVM Snapshots

The last session of the day was by Lenz Grimmer of MySQL. LVM snapshots can be a great way to backup your databases, especially InnoDB. The basic procedure is:

  • flush tables
  • flush tables with read lock
  • lvcreate -s
  • show master/slave status
  • unlock tables
  • mount snapshot, perform backup
  • unmount and discard the snapshot

InnoDB ignores the "flush tables with read lock" step, but if you have any MyISAM tables, you'll still need to do it. Flushing the tables does impact performance, especially while the snapshot is active. As soon as you mount the LVM partition snapshot, you can back it up and then unmount and discard the snapshot.

There is a Perl script called mylvmbackup which can help with these procedures.

An alternative to LVM snapshots for backups is to replicate to a slave server, stop the replication, perform the backup on the slave, then start replication again. The downside is it requires an extra machine as the slave in which MySQL can be stopped so that InnoDB tables can be properly flushed.

MySQL Quiz Show and Sun party

The quiz show is a absolute blast. The show is moderated by the infamous Jay Pipes. Facebook was kind enough to sponsor the quiz show this year. There was plenty of beer and popcorn to go around. People won a ton of books and Sheeri Kritzer Cabral won the grand prize: an Apple iPhone. Lucky!

Everybody ended up coming out of the wood work for the Sun after party. It was nice to finally get to meet Baron Schwartz. Everybody should go by his book! High Performance MySQL.

I also had a chat with Brian Moon of dealnews.com. He claims PHP can be made to work with the Apache worker MPM. Hmm, looks like I have a new project!

MySQL Conference Day 2 Thoughts

Wed, 04/16/2008 - 15:20
Keynotes

The keynote was kick started by Marten Mickos.  If you've never met Marten, he is, on a personal note, one of the greatest CEOs I've ever met.  The keynotes were especially interesting for me because it was the first time I've had the opportunity to listen to Jonathan Schwartz, the CEO of Sun Microsystems.  Jonathan seems like a great guy who gives the impression he "gets it".

The last keynote was by Werner Vogels of Amazon.  His talk covered Amazon's growth and the new services they offer including EC2.  He announced that EC2 now supports persistent storage, which is a huge improvement, but doesn't quite solve all of the problems.

Testing PHP/MySQL Applications with PHPUnit/DbUnit

I've never been big into testing, but I'm trying to change that.  Sebastian Bergmann, the author of PHPUnit Pocket Reference (free online version), talked about PHPUnit and DbUnit and why I should use them.  Installing PHPUnit is extremely simple if you have pear installed:

pear channel-discover pear.phpunit.de
pear install phpunit/PHPUnit

Once installed, just require PHPUnit:

require_once 'PHPUnit/Framework.php';

He just scratched the surface on writing unit tests. One thing he pointed out was using CruiseControl for automated testing. What's really cool is you can fire off CruiseControl from Subversion commit hooks. If the testing fails, CruiseControl can send an email with the results and who is to blame.

Practical MySQL for Web Applications

Domas Mituzas of MySQL and Wikipedia fame gave a good talk that covered practical design of web applications. The talk covered simple stuff, so I didn't learn a whole lot. Nevertheless, Domas sometimes says some funny things that make the talk enjoyable.

EXPLAIN Demystified

Baron Schwartz gave a talk about the EXPLAIN statement. EXPLAIN is run by prepending the word EXPLAIN to your SELECT statements. It only works on SELECT statements. When the query is run, it outputs an execution plan.

After running through the output of the EXPLAIN statement, he showed us mk-visual-explain which is one of the tools in Maatkit. It is a neato command line tool that takes the EXPLAIN output and reformats it as a tree structure. It's a great way to visualize the execution plan. Now if only there was a GUI version...

Upgrading to Elegant Versatile Database Architecture using PHP5 Data Objects

This talk was given by Sigurd Magnusson of SilverStripe and covered PDO. I already researched and used PDO, so it was mostly review.

After talking to some of the other people at the conference, I've been seriously thinking of moving away from PDO and using MySQL specific functions because they expose some *really* cool debugging and profiling information.

Exploring Amazon EC2 for Scale-out Applications

The thought of EC2 sounds really cool. The ability to create a server instance and host your stuff on it within minutes is sweet. Need more servers, no problem, add another instance. The speakers, Morgan Tocker of MySQL and Carl Mercier of Defensio, talked about their experience with EC2.

There are some serious data and management issues. Until the other day, there wasn't any kind of persistent storage, meaning when the server went offline, you lost all your data. Now you can mount a drive that persists across restarts. But one issue for critical business transactions is how and when data is written to disk. Is the data written immediately to disk or is buffered in the kernel or in some RAID card's cache?

Another issue they ran into is when a new machine is created, there's remnants of the previous machine's instance's data. So they need to zero out the drive which takes 5 hours on single instance.

What I took away from the talk is EC2 is great if your app is simple and relies on 3rd party services (i.e. Facebook, Google, etc) that are more reliable than EC2.

Service Oriented Architecture with PHP and MySQL

Joe Stump, a PHP hacker at Digg, gave a talk about SOA. It wasn't as much about "web services" as it is managing tasks and processing them asynchronously.

After talking to Joe, he highly recommended Gearman to manage tasks. From the Gearman site: "Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages."

So, if a user uploads an image, you can add the task of resizing the image to a backend processing mechanism. This allows for a responsive front-end for the user.

Joe, along with Chris Goffinet, are working on netgearman which is a PEAR package for interfacing with Gearman.

Memcached Hackathon BOF

This was a birds of a feature session where a bunch of people informally got together to discuss all things memcached. Patrick Galbraith of Grazr showed off Memcached Functions for MySQL. This is super cool. It allows you to set and get data in memcached within your SQL code via user defined functions.

So instead of pulling data from the DB to the app, then pushing it to memcached, you can just have a trigger or stored procedure store the value directly to memcached. One caveat is when you rollback a transaction, it won't unset the value from memcached.

There was some discussion about the memcached MySQL storage engine. After listening to them discuss it, I have to wonder if it is really worth it. It acts like a distributed memory table, except when a server in a cluster goes down, it will affect all the other servers.

Memcached and MySQL

Mon, 04/14/2008 - 16:10

The second and last tutorial of Monday was Memcached and MySQL: Everything you need to know by Brian Aker and Alan Kasindorf.

The talk was mainly about memcached and libmemcached and less on MySQL. That's OK since I have been meaning to learn more about memcached's internals.

Alan and Brian discussed the slab allocator, protocol, internal hash table, LRU (least recently used), and threading. The slab allocator is the name of memcached memory allocator. The LRU keeps track of the age of each slab. memcached uses a consistent hash algorithm for the slabs to be located quickly and supports dynamically adding new servers to the pool. One thing that is cool about the hash table is it is pluggable and you can choose a different algorithm to meet your needs.

memcached's protocol is ascii based, similar to HTTP, however they are in the process of finishing up a binary protocol which should have less overhead and better performance.

The current architecture is threaded and scales OK on machines with 4-8 cores, but when 16-32 core servers come out, memcached will not scale as well. Future versions will be improving threading support and scale better on larger machines.

Brian talked about the libmemcached a bit which is a memcached client library written in C. The API looks pretty easy as it uses very similar concepts and naming as other higher level client APIs.

One thing that Brian said that stood out is we should move away from synchronous actions and towards asynchronous event. It was a very interesting talk and Brian definitely knows what he's talking about.

High-availability MySQL and DRBD

Mon, 04/14/2008 - 15:52

The day of tutorials started out with All Bases Covered: A Hands-on Introduction to High-availability MySQL and DRBD by Florian Haas and Philipp Reisner.

After a brief introduction to DRBD, they started discussing the configuration file. There were a couple settings that I had set incorrectly on my servers.

Since I have my two servers connected via a gigabit crossover cable, I had my synchronization rate set to 125MB. They recommended approximately 1/3 your network and disk I/O so that you're applications don't freeze up during synchronization. Their test system used 30MB so I'll give it a try too.

Another setting they had different was the activity log extents. All of the references I looked at said to set the al-extents to 257. Actually, there's an equation to find this value which is E = (R x t) / 4 where E is the al-extents, R is the synchronization rate, and t is the target synchronization time (in seconds). If the sync rate is 30MB and target sync time is 240 seconds, then the extent would be 1800, which rounded to the nearest prime is 1801.

Heartbeat is the cluster manager to detect when a node is unavailable. You should have at least 2 heartbeat connections between the two nodes. If eth0 is your public network and eth1 is your private network, you will want to configure Heartbeat to send the heartbeats across the public network using multicast and broadcast for the private network.

# /etc/ha.d/ha.cf
bcast eth1
mcast eth0 239.0.0.42 694 1 0

The version of Heartbeat that they demonstrated was Heartbeat v2. I use the older v1, which isn't as powerful, but much simpler to configure. It was also the first time I have seen the Heartbeat GUI. The GUI makes it easy as cake to manage the Heartbeat resources and offers a level of monitoring. You can tell the GUI was written by a developer since the usability could be improved greatly.

I specifically asked if DRBD has any issues with partitions larger than 2TB and Florian basically said if you can create the partition (meaning the driver supports it), then DRBD supports it. He mentioned something about how all SCSI devices use 32-bit integers for addressing that limits you to 2TB. This was news to me. My SATA RAID card is technically seen by Linux as a SCSI device. I'm not sure if this is 100% accurate, but nevertheless there is an easy solution. If you have 4TB of space, you can chop it up into two 2TB partitions, then use either software RAID 0 (stripe) or LVM (linear or striped map mode).

I can't wait to build my next HA cluster, but this time using Heartbeat v2.

2008 MySQL Conference and Expo

Mon, 04/14/2008 - 03:08

It's that time of the year again and I'm in Santa Clara, CA, for the 2008 MySQL Conference & Expo. This year, the conference includes a lot more Web 2.0 topics.

MySQL Conference and Expo 2007

The conference starts out with a full day of tutorials. The two tutorials I signed up for are:

I've made it a point this time to learn about things that I wouldn't normally go to. This includes sessions about testings, benchmarking, and MySQL's information schema.

I'll be blogging about some of the sessions that I'll be attending, so stay tuned!