Tuesday, July 31, 2018

High Availability with MySQL Cluster, Setup From Command Line (II)

In the first delivery of this series of posts, delivered for whom who are interested to understand the basics of MySQL Cluster "by examples", I wrote about installing MySQL Cluster with a python utility called ndb_setup-py, which offers a nice web graphical interface to define and start our brand new cluster.

In this post I will share an example to do everything from scratch and manually, instead. Doing things manually is always the best recommendation to learn everything about processes life cycle through their:
  • Initialization
  • Administration (start/stop/reconfigure)
  • Monitoring (logs/counters/status)
  • Troubleshooting 
The resulting topology I'd like to setup is composed of 2 data nodes, 1 management node and 1 SQL node, as in the following picture:

Let's get down to business and deal today with processes initialization; I will discuss about setting up a MySQL Cluster environment from scratch and step by step on a Linux machine.
  1. Download generic binaries release for Linux
  2. File system preparation
  3. Initialize and start the management node
  4. Initialize and start the data nodes
  5. Initialize and start the SQL node
  6. Monitor the status 
In order to run a MySQL Cluster deployment, I will need to create:
  • A cluster configuration file, usually named config.ini. This is unique for all cluster processes.
  • The SQL node configuration file, usually named my.cnf. Must have one per SQL node.

1. Download generic binaries release for Linux

My favorite MySQL Cluster install method is using binaries, as to the purpose of learning how to install and administer a cluster it is not needed to install RPMs and their related scripts to start/stop a process as a service. Hence I recommend to download binaries by choosing "Linux - Generic" from related download page

2. File system preparation

Processes need to store stuff on the disk: not only to achieve durability, but also to store logs and other support files necessary to perform their duties. So every process will have a data directory. Plus, every process must define a nodeid, which is an integer number used to identify a process. Therefore, for simplicity, I recommend to create 4 folders under the desired path and name them with the nodeid I will set later in the configuration file. Important note: in order to make things easier, I will setup a whole cluster on my machine, while this is not the recommended topology in a real production environment, where data nodes and the management node should all run on dedicated hosts.
  • 49. This folder will store data directory for the management node
  • 1. This folder will store the data directory for data node 1
  • 2. This folder will store the data directory for data node 2 (replica of data node 1)
  • 50. This will store the data directory for the SQL node
Now that the four directories are created, let the fun begin!

3. Initialize and start the management node

Before starting the management node for the first time, I need to setup a config.ini file. Let's use this bare simple one.

[NDB_MGMD]
HostName=127.0.0.1
DataDir=/home/mortensi/cluster/49
NodeId=49

[NDBD]
NodeId=1
HostName=127.0.0.1
DataDir=/home/mortensi/cluster/1

[NDBD]
NodeId=2
HostName=127.0.0.1
DataDir=/home/mortensi/cluster/2

[MYSQLD]
HostName=127.0.0.1
NodeId=50

This configuration file, the easiest possible (all configuration parameter have default values), is telling that:
  • Node ids are: 49 for the management node, 1 and 2 for the data nodes and 50 for the SQL node (the mysqld instance)
  • We're setting up a topology where all processes are co-located on the same localhost machine
  • We will use those folders we created as data directory.
Let's start the management node now:

ndb_mgmd --config-file=/home/mortensi/cluster/config.ini --configdir=/home/mortensi/cluster/49 --initial &

If no error is reported, we can check if the management process is running with ndb_mgm command line client tool like this:

ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=1 (not connected, accepting connect from 127.0.0.1)
id=2 (not connected, accepting connect from 127.0.0.1)

[ndb_mgmd(MGM)] 1 node(s)
id=49 @127.0.0.1 (mysql-5.7.22 ndb-7.6.6)

[mysqld(API)]   1 node(s)
id=50 (not connected, accepting connect from 127.0.0.1)

This output is telling that our 2 data nodes defined in the config.ini have not been started, but the management was started successfully. Finally, the SQL node is not connected, yet.

4. Initialize and start the data nodes

Now it's time to start the core processes for any cluster setup: data nodes! Data nodes can only be started after the management node, as they need to connect to it to retrieve the configuration of the cluster. Run the following:

ndbmtd --connect-string=localhost --initial --ndb-nodeid=1 &
ndbmtd --connect-string=localhost --initial --ndb-nodeid=2 &

Check again status with ndb_mgm:

ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=1 @127.0.0.1 (mysql-5.7.22 ndb-7.6.6, Nodegroup: 0, *)
id=2 @127.0.0.1 (mysql-5.7.22 ndb-7.6.6, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=49 @127.0.0.1 (mysql-5.7.22 ndb-7.6.6)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from 127.0.0.1)

We're done! Now that the cluster is up and running, we only need to connect through the SQL node and play! Remember that future data nodes restart, must be done without --initial option, otherwise data nodes data directories will be erased!

5. Initialize and start the SQL node

To finalize this cluster installation, we need to initialize SQL node data directory as follows (must indicate the path to the binary release and to the brand new datadir candidate folder):

mysqld --initialize-insecure --datadir="/home/mortensi/cluster/50" --basedir="/export/home/mortensi/software/mysql-cluster-gpl-7.6.6-linux-glibc2.12-x86_64/" --user=mortensi

Now that the SQL node has been initialized, let's use this simple my.cnf configuration file to start it:

[mysqld]
ndbcluster=on
ndb_nodeid=50
datadir=/home/mortensi/cluster/50
basedir=/export/home/mortensi/software/mysql-cluster-gpl-7.6.6-linux-glibc2.12-x86_64/
socket=/home/mortensi/cluster/50/mysqld.sock
log_error=/home/mortensi/cluster/50/mysqld.log

Save the file, and let's go:

mysqld --defaults-file=/home/mortensi/cluster/my.cnf &

You will now be able to connect:

mysql -h127.0.0.1 -uroot

And check your brand new cluster status!

mysql> select * from ndbinfo.nodes;
+---------+--------+---------+-------------+-------------------+
| node_id | uptime | status  | start_phase | config_generation |
+---------+--------+---------+-------------+-------------------+
|       1 |   1691 | STARTED |           0 |                 1 |
|       2 |   1690 | STARTED |           0 |                 1 |
+---------+--------+---------+-------------+-------------------+
2 rows in set (0.02 sec)

Cool, isn't it?

6. Monitor status

To complete this mini tutorial, let's just check status for this cluster as already done:

ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=1 @127.0.0.1 (mysql-5.7.22 ndb-7.6.6, Nodegroup: 0, *)
id=2 @127.0.0.1 (mysql-5.7.22 ndb-7.6.6, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=49 @127.0.0.1 (mysql-5.7.22 ndb-7.6.6)

[mysqld(API)] 1 node(s)
id=50 @127.0.0.1 (mysql-5.7.22 ndb-7.6.6)

All processes declared as belonging to the cluster (topology was defined in the config.ini file) are now up and running, mission accomplished!
Blogger Tricks

Wednesday, July 25, 2018

Top 5 Things to Consider to Get Started with MySQL Performance Tuning

Today I'll share a few slides I prepared last year for a presentation delivered at Oracle Open World. This is a quick and easy hands-on lab for fresh MySQL Server DBAs. I chose 5 among the most relevant topics when tuning and scaling a MySQL Server using InnoDB tables.

In particular, in this hands-on, I will talk about:
  • Scaling connections
  • The threads model
  • InnoDB REDO log
  • InnoDB Buffer Pool
  • The Execution plan


MySQL Performance Tuning 101 from Mirko Ortensi

Rate at which MySQL is delivering new features and improvements is impressive, in fact MySQL Server 8 boosts performance in many fields, especially regarding InnoDB REDO logging. Hence while the rest of recommendations still apply for new MySQL Server 8, the tuning REDO log flushing strategy is not mandatory anymore to achieve an improved throughput starting from MySQL 8. Improvements in such a field are mentioned by this blog post edited by Dimitri Kravtchuk, MySQL Performance Architect at Oracle MySQL (Twitter).

For an overview of optimization techniques, I recommend having a look at official documentation.

Tuesday, July 24, 2018

High Availability with MySQL Cluster, a Quick How-To Guide for Dummies (I)

I have been playing with MySQL Cluster for some years now, and today I'd like to start writing a bit about it, how to set it up, configure, backup and also how to use it, as there's plenty of ways to drive operations towards the Cluster for brutal speed  and concurrency. MySQL Cluster, the open source in memory database from Oracle MySQL, is available for free from MySQL Cluster download page (Community version has GPL license).

But before starting with an overview of installation and setup, if you're new to MySQL Cluster, I would strongly recommend to have a look at this video.



What I find more interesting about MySQL Cluster, is that it is possible to have a setup running on commodity hardware (the bare laptop), as it can be configured to have a minimum footprint in terms of memory and storage requirements. About high availability, MySQL Cluster offers unique features to resist any single failure (it has no single point of failure: that means any process, communication link or hardware component may crash/fail but the cluster will still be available) while offering a consistent view of data (synchronous replication between data node, a unique feature in MySQL databases).

But.. what is MySQL Cluster?

MySQL Cluster is a cluster of processes, they are:
  1. ndb_mgmd management process. A MySQL Cluster setup must have at least one (can have more, for redundancy). It is recommended to execute it on a host where ndbmtd processes are not running, but does not need to be on a dedicated host, though. It consumes little resources and act as arbitrator to rule out eventual split brain situations and serves a number of functions. 
  2. ndbmtd data node process. Default configuration includes 2 data nodes processes, they implement the NDB (NDB stands for Network Database) storage engine and store data in memory. The recommendation is to execute one ndbmtd on its own host. Data nodes are replicated synchronously, hence any of these can crash and Cluster would survive and be serviceable.
  3. mysqld instances, aka SQL nodes. These are the classical MySQL Servers delivered with MySQL Cluster distribution (they differ from standard standalone distribution in that they are compiled to make use of NDB storage engine). They can be used to retrieve MySQL Cluster metrics, to execute SQL queries against the cluster (as we'll see this is not the only way to fetch data from it) and also are used to implement geographical replication (among other uses). SQL nodes can run on dedicated hosts but can also share host with data nodes or ndb_mgmd node. Best option is always away from a host running ndbmtd data nodes, as resources needed by SQL nodes can be variable depending on operations, then it's better not to add variable load to the host, which may affect data nodes performance (at least data nodes footprint is quite stable over time).
Full detailed summary is available at official documentation page.

How to install MySQL Cluster?

There are two main distributions: the RPM/DEB package and the binary compressed package, I usually prefer to get the binary one, as it is more flexible and also easier to manage for new users. With binary distribution, everything is contained within a folder. 

To get started quickly with MySQL Cluster, I would recommend to deploy a cluster on a single host using ndb_setup.py utility, available under the folder of binaries (bin). This command will open a graphical interface in a browser served by a minimalistic Python web server. It is really a simple tool and better suited to deploy all cluster processes at localhost (not the production recommended topology, but it's a good idea to start up quickly). This video shows a basic installation on multiple hosts (can be virtual machines as well).



MySQL Cluster documentation is pretty good at explaining the basics about the database, its specifications and tuning options, but I would also like to point to a couple of books:
  • Pro MySQL NDB Cluster, written by Jesper Wisborg Krogh and Mikiya Okuno, Support Engineers at Oracle MySQL. The book contemplates the basics and much more to get started and administer a MySQL Cluster database.
  • MySQL Cluster 7.5 Inside and Out, written by Mikael Ronstrom, NDB creator and Senior Architect at Oracle MySQL. It offers an in depth description of MySQL Cluster guts, together with the story of the product.
In the next chapter, I will share some basic scripts to start the cluster from command line. Stay tuned!


Thursday, August 3, 2017

Raspberry Surveillance System, The Complete Solution


I have shown how to build a basic system based on a Raspberry to capture a picture through a PiCamera and share it on a web page. Now I'd like to share a bit more of my experiments by describing an end-to-end approach to detect movements in the monitored area and alert of a possible intrusion. To the purpose, I'll share what I did setup with basic pieces of code found browsing around the internet. This minimal system is made of three simple elements:
  1. Telegram bot
  2. Motion detection algorithm
  3. Alert system

The Telegram Bot

A Telegram Bot is a nice and free manner to:
  • Send commands to your surveillance system from your personal Telegram account, start/stop services, reconfigure them...Indeed you can still use an SSH connection but Telegram is quick, visual, and immediate. Last but not least, it will allow you to receive alerts on your smartphone.
  • Receive messages coming from your surveillance system. When an alert is generated you will be informed by a message from your bot with the information and format you configure.
So how to create a Telegram bot? That's easier to do than to explain. Just add the BotFather to your contact list and start messaging! Send "/start" for the list of functionalities. To create a bot you'll have to send "/newbot". Check the example:

Mirko Ortensi, [Jul 20, 2017, 9:06 PM]:
/newbot
BotFather, [Jul 20, 2017, 9:06 PM]:
Alright, a new bot. How are we going to call it? Please choose a name for your bot.
Mirko Ortensi, [Jul 20, 2017, 9:06 PM]:
mybotname
BotFather, [Jul 20, 2017, 9:06 PM]:
Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot.
Mirko Ortensi, [Jul 20, 2017, 9:07 PM]:
mybotnameBot
BotFather, [Jul 20, 2017, 9:07 PM]:
Done! Congratulations on your new bot. You will find it at t.me/mybotnameBot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.
Use this token to access the HTTP API:
447633501:AAH_wZZupcBIGQEC_pPV7[...]
For a description of the Bot API, see this page: https://core.telegram.org/bots/api

Nice! Now you have your bot entity ready to talk to your surveillance system! Write down the API token, which you'll use to authenticate to your bot as bot administrator and send/receive messages.

As the system is written in Python, you'll manage your Telegram bot with python-telegram-bot, so be sure to install it in your Raspberry. You can accomplish that by running:

python -m pip install python-telegram-bot

Motion Detection

This is the core of every surveillance system: the capability to detect any change in the monitored area and take a decision (which is usually inform the administrator). I started from a very basic Google search and hit this lightweight motion detection Python algorithm which is pretty intuitive: few lines, easy to read, quick and, most important, it works pretty well. This piece of code will analyse frames captured with raspistill and detect changes based on previous buffered frame. If a change is detected, the captured frame is written to disk. The algorithm will allow configuration for:
  • Threshold (how much a pixel has to change by to be marked as "changed")
  • Sensitivity (how many changed pixels before capturing an image)
  • ForceCapture (whether to force an image to be captured every forceCaptureTime seconds)
  • Disk space to reserve: once hit, no more frames will be written to disk
I have been trying it for some weeks and it works pretty well, I will share the code with the thresholds I set:

#!/usr/bin/python
import StringIO
import subprocess
import os
import time
from datetime import datetime
from PIL import Image

# Motion detection settings:
# Threshold (how much a pixel has to change by to be marked as "changed")
# Sensitivity (how many changed pixels before capturing an image)
# ForceCapture (whether to force an image to be captured every forceCaptureTime seconds)
threshold = 30
sensitivity = 60
forceCapture = True
forceCaptureTime = 60 * 60 # Once an hour

# File settings
saveWidth = 1280
saveHeight = 960
diskSpaceToReserve = 400 * 1024 * 1024 # Keep 400 mb free on disk

# Capture a small test image (for motion detection)
def captureTestImage():
    print("capture..")
    command = "raspistill -w %s -h %s -t 3000 -e bmp -o -" % (100, 75)
    imageData = StringIO.StringIO()
    imageData.write(subprocess.check_output(command, shell=True))
    imageData.seek(0)
    im = Image.open(imageData)
    buffer = im.load()
    imageData.close()
    return im, buffer

# Save a full size image to disk
def saveImage(width, height, diskSpaceToReserve):
    keepDiskSpaceFree(diskSpaceToReserve)
    time = datetime.now()
    filename = "/tmp/capture-%04d%02d%02d-%02d%02d%02d.jpg" % (time.year, time.month, time.day, time.hour, time.minute, time.second)
    subprocess.call("raspistill -w 1296 -h 972 -t 3000 -e jpg -q 15 -o %s" % filename, shell=True)
    print "Captured %s" % filename

# Keep free space above given level
def keepDiskSpaceFree(bytesToReserve):
    if (getFreeSpace() < bytesToReserve):
        for filename in sorted(os.listdir(".")):
            if filename.startswith("capture") and filename.endswith(".jpg"):
                os.remove(filename)
                print "Deleted %s to avoid filling disk" % filename
                if (getFreeSpace() > bytesToReserve):
                    return

# Get available disk space
def getFreeSpace():
    st = os.statvfs(".")
    du = st.f_bavail * st.f_frsize
    return du
     
# Get first image
image1, buffer1 = captureTestImage()

# Reset last capture time
lastCapture = time.time()

while (True):
    # Get comparison image
    image2, buffer2 = captureTestImage()

    # Count changed pixels
    changedPixels = 0
    for x in xrange(0, 100):
        for y in xrange(0, 75):
            # Just check green channel as it's the highest quality channel
            pixdiff = abs(buffer1[x,y][1] - buffer2[x,y][1])
            if pixdiff > threshold:
                changedPixels += 1

    # Check force capture
    if forceCapture:
        if time.time() - lastCapture > forceCaptureTime:
            changedPixels = sensitivity + 1
             
    # Save an image if pixels changed
    if changedPixels > sensitivity:
        lastCapture = time.time()
        saveImage(saveWidth, saveHeight, diskSpaceToReserve)
 
    # Swap comparison buffers
    image1 = image2
    buffer1 = buffer2

Alert System

Now the final piece of code; we have to inform the user whenever a motion is detected. I wrote two lines of code to send a Telegram message from the bot. This simple algorithm counts frames captured in /tmp folder (every 5 seconds, let's say) Whenever this number is increased, the user receives a message. Something like this:

#!/usr/bin/python

import sys
import time
import random
import datetime
import telegram
import os
import logging
import fnmatch

bot = telegram.Bot(token='447633501:AAH_wZZupcBIGQEC_pPV7[...]')

elems = len(fnmatch.filter(os.listdir("/tmp"), '*.jpg'))
print elems

while (True):
    if (elems != len(fnmatch.filter(os.listdir("/tmp"), '*.jpg'))):
       bot.sendMessage(chat_id=<id_from_userinfobot>, text="Motion!")
       elems = len(fnmatch.filter(os.listdir("/tmp"), '*.jpg'))
    time.sleep(5)

Don't forget to:
  1. Create the bot object by using the token provided by BotFather
  2. Send the message to the user id to be notified. In order to find out your user id, send "/start" command to userinfobot.

Wrap everything up

Now that you have the Telegram bot created, the motion detection loop algorithm and a manner to send alerts, you'll have to wrap it up. This simple architecture is composed of just three daemons:
  1. Telegram bot daemon to receive commands from Telegram user and perform actions
  2. Motion detection daemon to parse frames looking for movements
  3. Alert system daemon which, based on motion detection daemon, send a Telegram notification
These three daemon can be run as start/stop/status services in /etc/init.d as usual, whereas Telegram bot code could be something like:

#!/usr/bin/python

from telegram.ext import Updater, CommandHandler
import subprocess

def status(bot, update):
    command = "sudo ps ax | grep ".py" | grep -v grep | grep -v sudo"
    try:
        p = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
        (output, err) = p.communicate()
        p_status = p.wait()
        update.message.reply_text(output)
    except Exception as e:
        output = str(e)
        update.message.reply_text(output)

def startm(bot, update):
    command = "sudo /etc/init.d/motiond start"
    try:
        p = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
        (output, err) = p.communicate()
        p_status = p.wait()
        update.message.reply_text("Motion Detection Enabled")
    except Exception as e:
        output = str(e)
        update.message.reply_text(output)

def stopm(bot, update):
    command = "sudo /etc/init.d/motiond stop"
    try:
        p = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
        (output, err) = p.communicate()
        p_status = p.wait()
        update.message.reply_text("Motion Detection Disabled")
    except Exception as e:
        output = str(e)
        update.message.reply_text(output)

def hello(bot, update):
    update.message.reply_text(
        'Hello {}'.format(update.message.from_user.first_name))

updater = Updater('447633501:AAH_wZZupcBIGQEC_pPV7[...]')

updater.dispatcher.add_handler(CommandHandler('startm', startm))
updater.dispatcher.add_handler(CommandHandler('stopm', stopm))
updater.dispatcher.add_handler(CommandHandler('status', status))

updater.start_polling()
updater.idle()
Here I implemented three basic commands like:

  • Start motion detection
  • Stop motion detection
  • Check Python processes running, so to be sure all the processes are running
You can add more controls, like starting/stopping alerting system...
This is a demonstrative code, you could think of adding a watchdog to make sure everything is running or join the three daemons into one. 

Wednesday, July 26, 2017

Raspberry surveillance system, your house goes live on the internet

After leaving my brand new Raspberry 3 on the shelf for months (I admit being an impulsive buyer) I decided to give it some use by setting up a surveillance system with it. Raspberry 3 has more than enough power for easy image processing, and the camera developed for its GPU has nice quality at a very accessible price.



So, in order to start with this project, I got this PiCamera v2. I got that one with InfraRed (IR) filter, as I don't plan to take nightly shots with added IR lighting (maybe next prototype). It is really easy to connect its flex cable to the Raspberry (though unfortunately I'll have to leave the original case open and remove the top lid, as the case does not have a slot for the flex cable).

Once the camera is connected (beware to shut the system down before trying any connect/disconnect), I upgraded the whole Raspbian system with:

sudo apt-get update
sudo apt-get upgrade
sudo rpi-update

Now it's time to enable PiCamera module, this can be done by launching:

sudo raspi-config
Related option to toggle is under "Interfacing options".

Now that the camera is ready to be used, it's time to start playing with it. It is easy to take advantage of the official tool to take snapshots, raspistill.

You can take a snapshot with the bare:

raspistill -o pic.jpg
Ok, now we have everything to start playing seriously!

Next step is to install the bare minimum NGINX web server. Purpose of our prototype is to take a picture every n minutes, and show it on a web page hosted locally on your Raspberry and served by NGINX.

Before doing that, remember to:
  1. Make yourself with a dynamic DNS service, I tried Dynu which is free and easy to setup. Everything is clearly explained, you'll have to remember to setup "~/dynudns/dynu.sh" script with cron, which will update the dynamic DNS with your home IP. 
  2. Set a static IP for your Raspberry in your local network (e.g. 192.168.x.x)
  3. Log into you router configuration panel and set port forwarding to your Raspberry (port 8080 should reach the static IP set at previous step)
Once done, you'll probably be able to reach your third level chosen domain (e.g. "yourhouse.freeddns.org") and see the welcome page. Let's now change the HTML basic code under /var/www/html with something like:

<!DOCTYPE html>
<html>
<head>
<title>Welcome to my house!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<img src="my.jpg" alt="myhouse" style="width:100%;">
</body>
</html>


So this snippet is basically serving one picture captured by your PiCamera and stored locally to index.html file, this would be at "/var/www/html/my.jpg".

Last step is setting up a cron job with the related instructions to capture a periodical snapshot, this would be something like:

*/10 * * * * sudo raspistill -q 15 -w 1024 -h 768 -o /var/www/html/my.jpg

This will capture every 10 minutes a medium quality picture, quick to be served over the internet. Tune at your own preferred quality and capture rate!