MTU

MTU is the maximum frame size that can be transmitted on a network link without fragmentation. It is associated with the NIC, and the link layer protocol.

Read more…

Page cache and page writeback

Today I was dealing with an interesting case that required some Linux kernel memory management knowledge, specifically — page I/O. A customer was complaining that when they scp a 8 GB file to their Oracle Enterprise Linux 5.4, shortly the box would become unavailable, and they wouldn’t be able to ssh into it. The only information I had were the outputs from vmstat and iostat. Luckily this was sufficient to conclude what was happening.

Read more…

How I registered wr.gl

I always wanted to create my own URL shortening service, but the problem is that most of the short and attractive domain names are already taken. For that reason, I decided to write a bash script to help me find a simple 2-letter domain name that sounds catchy and easy to remember. This is how I registered wr.gl (Wriggle!).

Read more…

R and ggplot2

I would like to learn R and ggplot2, to create statistical data visualization graphics. And also use graphs such as box plots (useful for identifying outliers and comparing distributions).

Query the probable implications of your data

Here’s an interesting project. Currently in alpha, BayesDB is a Bayesian database table that lets users query the probable implications of their data using a SQL-like language (BQL).  I guess it is primarily aimed at people without statistics expertise, but who have tabular data and questions they’d like to answer. It would be nice to see some MySQL/PostgreSQL connectors.

collectl

collectl is a performance monitoring tool written in Perl, and similar to sar.

Read more…

Display errors for all NICs with sysfs

Recently, while I was investigating a network issue on one of the Xen hosts in our fleet, I was curious if I could quickly gather any error or drop counters for the host’s physical NICs and numerous domU VIFs.

I could have performed a netstat -i or look at /proc/net/dev but I guess it wouldn’t be that straightforward to filter the output for the information that I might be interested in further (filtering only by TX or RX, errors or drops, counting columns and using awk can be cumbersome, etc). Therefore, I crafted a one-liner to query sysfs for this purpose, and the beauty of Unix’ philosophy of simple tools shimmered once again:

$ find /sys/class/net/*/statistics/ -type f -regex '.*\(errors\|dropped\)' -exec grep -vH "^0$" {} \;

This will display a set of *errors or *dropped files that have >0 in their content, for all the interfaces. This is easy to grep further. Stylish, no?

ps and top report different CPU utilization

Today, I was checking named on one of the servers, and spotted something surprising — top was reporting 100% CPU utilization, while ps was reporting only 0.5%. Checking the documentation, it seems that there are following differences:

  • ps reports utilization of the entire lifetime of a process (this is not ideal, but rather an indicator of just that)
  • top reports utilization between the last screen refresh

It is logical actually, but not so self-evident until you have a process that has been executing fine for a long time, but suddenly started misbehaving.

By the way, which of those two your monitoring tool is taking into account?

High load during rsync or scp

Essentially, large backup operations are well-known to degrade the performance on the receiving end as a result of driving any of the resources that are used by specific features of the backup operation (CPU, network, I/O). The slowdown of the system during the backup window is expected, and is usually followed by higher load average and %iowait.

You may want to try to perform the backup in reverse direction — from server to client, so you could use nice and ionice to lower the priority and reduce the load a bit, for an example:

$ nice -n 19 ionice -c2 -n7 scp -ri ~/.ssh/your-private-key.pem user@client:/client/folder/to/backup /server/destination/folder

Furthermore, you can try limiting the transfer rate of scp and rsync with appropriate switches (-l, –bwlimit, respectively).

‘watch’ the differences

I’ve used the watch command to repeatedly execute a command and monitor its output, but today I’ve found out that it supports a useful -d switch, which highlights the changes between successive updates. For example, watch for increasing errors or packet drops for a network interface:

$ watch -tdn1 cat /proc/net/dev