XenServer 5.6 FP1 : File System on Control Domain Full

The issue I describe in this blog are related to my XenServer, my lab at home, be very careful with the modification I explain if you plan to make this change in your own environment… As always, do it on a test environment before please !

Since last week and my XenServer @ home crash, I chose to install the latest XenServer version, 5.6 FP1 (more info here) to try out some new features like Intellicache / open vSwitch / VM Protection and Recovery and the self service portal. The installation was as smooth as usual, just one question more than the previous version about the installation of Intellicache.

I’ve been able to reinstall all my VMs, configure everything and check all these new features. The only issue was my file system was running out of space very very very quickly…

I didn’t understand why at first because I’m not that skilled with XenServer with command line stuff but I needed to find out what was happening.

The first place to check when something wrong is going on, are the log files, you can find XenServer logs files here : /var/log , and the size of my log files were huge ! in less than 24hour I had 3 rotations of log on kern.log and messages log files… To buy me some time and allow me to continue to connect to the XenServer with XenCenter, I made some change in the rotation log configuration file /etc/logrotate.conf :

# see "man logrotate" for details
# rotate log files when they are bigger than 1MB.
# logrotate is broken so it cannot parse the size parameter.
# however the default is "size 1M".
#size 1M
 
# keep 20MB of logfiles
rotate 5
 
# create new (empty) log files after rotating old ones
create
 
# uncomment this if you want your log files compressed
compress
 
# RPM packages drop log rotation information into this directory
include /etc/logrotate.d
 
# no packages own wtmp -- we'll rotate them here
/var/log/wtmp {
    monthly
    minsize 1M
    create 0664 root utmp
    rotate 1
}
 
# system-specific logs may be also be configured here.

I made the following change, rotate value was 20 I changed it to 5, I activated the compression of log file archives. It did give me some more time to work on this weird issue but not enough, in less than 12 hours my file system was full again… XenCenter wasn’t able to connect and the xsconsole command line wasn’t working as well. The only way to recover the interface was to kill the xapi process and restart it by using : xe-toolstack-restart command line (I needed to free as much space as possible before). Then I’ve been able to use XenCenter and xsconsole.

I got the message bellow after all these troubleshooting stuffs but still no clue why my log files were becoming so big so quickly… Of course I looked the log file and I saw :

Jan 26 04:02:02 suomixen kernel: WARNING: at net/core/dev.c:1594 skb_gso_segment+0x1a1/0x250()
Jan 26 04:02:02 suomixen kernel: Hardware name: System Product Name
Jan 26 04:02:02 suomixen kernel: netbk: caps=(0x10801, 0x191829) len=2960 data_len=2910 ip_summed=0
Jan 26 04:02:02 suomixen kernel: Modules linked in: usb_storage usb_libusual tun cifs lockd sunrpc openvswitch_mod llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables binfmt_misc nls_utf8 isofs dm_mirror video output sbs sbshc fan container battery ac parport_pc lp parport nvram pata_jmicron pata_acpi sg thermal evdev ohci1394 jmicron ieee1394 processor ata_generic button thermal_sys r8169 e1000e mii serio_raw rtc_cmos rtc_core rtc_lib tpm_tis i2c_i801 tpm tpm_bios i2c_core pcspkr dm_region_hash dm_log dm_mod ide_gd_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd usbcore fbcon font tileblit bitblit softcursor last unloaded: microcode
Jan 26 04:02:02 suomixen kernel: Pid: 1188, comm: netback/1 Tainted: G W 2.6.32.12-0.7.1.xs5.6.100.307.170586xen #1
Jan 26 04:02:02 suomixen kernel: Call Trace:
Jan 26 04:02:02 suomixen kernel: ? skb_gso_segment+0x1a1/0x250
Jan 26 04:02:02 suomixen kernel: ? skb_gso_segment+0x1a1/0x250
Jan 26 04:02:02 suomixen kernel: warn_slowpath_common+0x7c/0xa0
Jan 26 04:02:02 suomixen kernel: ? skb_gso_segment+0x1a1/0x250
Jan 26 04:02:02 suomixen kernel: warn_slowpath_fmt+0x26/0x30
Jan 26 04:02:02 suomixen kernel: skb_gso_segment+0x1a1/0x250
Jan 26 04:02:02 suomixen kernel: dev_hard_start_xmit+0x173/0x390
Jan 26 04:02:02 suomixen kernel: sch_direct_xmit+0x16d/0x1f0
Jan 26 04:02:02 suomixen kernel: ? kmap_atomic_prot+0x41/0x1c0
Jan 26 04:02:02 suomixen kernel: dev_queue_xmit+0x291/0x4b0
Jan 26 04:02:02 suomixen kernel: netdev_send+0x29/0x40 openvswitch_mod
Jan 26 04:02:02 suomixen kernel: vport_send+0x49/0x100 openvswitch_mod
Jan 26 04:02:02 suomixen kernel: do_output+0x1a/0x30 openvswitch_mod
Jan 26 04:02:02 suomixen kernel: execute_actions+0x32b/0x9f0 openvswitch_mod
Jan 26 04:02:02 suomixen kernel: ? _spin_unlock_bh+0x23/0x30
Jan 26 04:02:02 suomixen kernel: dp_process_received_packet+0xe5/0x190 openvswitch_mod
Jan 26 04:02:02 suomixen kernel: vport_receive+0x38/0xa0 openvswitch_mod
Jan 26 04:02:02 suomixen kernel: internal_dev_xmit+0x49/0x90 openvswitch_mod
Jan 26 04:02:02 suomixen kernel: dev_hard_start_xmit+0x247/0x390
Jan 26 04:02:02 suomixen kernel: dev_queue_xmit+0x3cc/0x4b0
Jan 26 04:02:02 suomixen kernel: ? ip_finish_output+0x0/0x2c0
Jan 26 04:02:02 suomixen kernel: ip_finish_output+0x138/0x2c0
Jan 26 04:02:02 suomixen kernel: ? ip_finish_output+0x0/0x2c0
Jan 26 04:02:02 suomixen kernel: ip_output+0x65/0xb0
Jan 26 04:02:02 suomixen kernel: ? ip_finish_output+0x0/0x2c0
Jan 26 04:02:02 suomixen kernel: ip_local_out+0x18/0x20
Jan 26 04:02:02 suomixen kernel: ip_queue_xmit+0x197/0x390
Jan 26 04:02:02 suomixen kernel: ? tcp_sacktag_write_queue+0x316/0x910
Jan 26 04:02:02 suomixen kernel: ? __skb_clone+0x22/0xd0
Jan 26 04:02:02 suomixen kernel: tcp_transmit_skb+0x376/0x690
Jan 26 04:02:02 suomixen kernel: tcp_write_xmit+0x33c/0x900
Jan 26 04:02:02 suomixen kernel: ? __kfree_skb+0x34/0x80
Jan 26 04:02:02 suomixen kernel: ? tcp_current_mss+0x3d/0x60
Jan 26 04:02:02 suomixen kernel: __tcp_push_pending_frames+0x31/0x90
Jan 26 04:02:02 suomixen kernel: tcp_rcv_established+0xf7/0x590
Jan 26 04:02:02 suomixen kernel: tcp_v4_do_rcv+0xa3/0x1e0
Jan 26 04:02:02 suomixen kernel: ? security_sock_rcv_skb+0xf/0x20
Jan 26 04:02:02 suomixen kernel: tcp_v4_rcv+0x5fd/0x7b0
Jan 26 04:02:02 suomixen kernel: ? nf_iterate+0x53/0x80
Jan 26 04:02:02 suomixen kernel: ? ip_local_deliver_finish+0x0/0x1c0
Jan 26 04:02:02 suomixen kernel: ip_local_deliver_finish+0x7e/0x1c0
Jan 26 04:02:02 suomixen kernel: ip_local_deliver+0x2d/0x90
Jan 26 04:02:02 suomixen kernel: ? ip_local_deliver_finish+0x0/0x1c0
Jan 26 04:02:02 suomixen kernel: ip_rcv_finish+0x12f/0x330
Jan 26 04:02:02 suomixen kernel: ip_rcv+0x1e2/0x2c0
Jan 26 04:02:02 suomixen kernel: ? ip_rcv_finish+0x0/0x330
Jan 26 04:02:02 suomixen kernel: ? ip_rcv+0x0/0x2c0
Jan 26 04:02:02 suomixen kernel: netif_receive_skb+0x401/0x660
Jan 26 04:02:02 suomixen kernel: process_backlog+0x92/0xe0
Jan 26 04:02:02 suomixen kernel: net_rx_action+0x159/0x230
Jan 26 04:02:02 suomixen kernel: __do_softirq+0xba/0x180
Jan 26 04:02:02 suomixen kernel: ? __kmalloc+0x59/0x180
Jan 26 04:02:02 suomixen kernel: do_softirq+0x75/0x80
Jan 26 04:02:02 suomixen kernel: netif_rx_ni+0x1a/0x20
Jan 26 04:02:02 suomixen kernel: net_tx_action+0xe92/0x1520
Jan 26 04:02:02 suomixen kernel: ? sched_clock+0xa/0x10
Jan 26 04:02:02 suomixen kernel: ? update_curr+0x72/0xf0
Jan 26 04:02:02 suomixen kernel: ? __dequeue_entity+0x21/0x40
Jan 26 04:02:02 suomixen kernel: ? set_next_entity+0x1f/0x50
Jan 26 04:02:02 suomixen kernel: ? schedule+0x2e9/0x970
Jan 26 04:02:02 suomixen kernel: netbk_action_thread+0x7f/0x160
Jan 26 04:02:02 suomixen kernel: ? autoremove_wake_function+0x0/0x50
Jan 26 04:02:02 suomixen kernel: ? netbk_action_thread+0x0/0x160
Jan 26 04:02:02 suomixen kernel: kthread+0x74/0x80
Jan 26 04:02:02 suomixen kernel: ? kthread+0x0/0x80
Jan 26 04:02:02 suomixen kernel: kernel_thread_helper+0x7/0x10
Jan 26 04:02:02 suomixen kernel: --- end trace 7e36fc89830d15c4 ---
Jan 26 04:02:02 suomixen kernel:

Ok… I didn’t catch everything, I knew it was something about network but I didn’t knew what exactly… So I posted that on Citrix forums and Radoslaw Smigielski was kind enough to give me some clue. And to make it short, it appears the network driver throws exceptions because changes have been made in the main Linux kernel around network drivers…

It was working fine on XenServer 5, 5.5, 5.6 and even 5.6FP1 beta, but not anymore with the final version. To be clear, it’s working but log files are out of control… The clue Radek gave me to try to get rid of this issue was the good one, I had to try to disable the TCP Offload on the physical network adapter (PIF). I made some search on internet and I found a cool website : Will-Bloggs-too with a ready script to disable the checksum on the the PIF and VIF:

#!/bin/bash
 
if_modes="rx tx sg tso ufo gso"
 
if [[ "$1" == "--local" || "$1" == "-l" ]]; then
    echo -n "disabling checksum offloading for local devices... "
    for iface in $(ifconfig | awk '$0 ~ /Ethernet/ { print $1 }'); do
        for if_mode in ${if_modes}; do
          ethtool -K $iface $if_mode off 2>/dev/null
        done
    done
    echo "done."
else
    echo -n "disabling checksum offloading in xapi settings... "
    for VIF in $(xe vif-list --minimal | sed -e 's/,/ /g')
    do
        ###xe vif-param-clear uuid=$VIF param-name=other-config
        for if_mode in ${if_modes}; do
            xe vif-param-set uuid=$VIF other-config:ethtool-${if_mode}="off"
        done
    done
    for PIF in $(xe pif-list --minimal | sed -e 's/,/ /g')
    do
        ###xe pif-param-clear uuid=$PIF param-name=other-config
        for if_mode in ${if_modes}; do
            xe pif-param-set uuid=$PIF other-config:ethtool-${if_mode}="off"
        done
    done
    echo "done."
fi

After a reboot, everything went back to normal in my log files. That’s nice ! but what does it means exactly ??

Disabling TCP Offload and checksum on physical and virtual network devices means from now on, the main CPU will handle all the load which was previously handle directly by the network card. On my test lab, this is not a big deal but in a production environment, the impact can be critical !

During the last few day I learn a bit more about XenServer and Linux in general, and I used very useful commands and the administrators I meet along my visits at my customer’s places should remember it :

If you want to check you disk space left df -h and if you’re looking for file bigger than a given size, you can use this command line : find {/path/to/directory/} -type f -size +{size-in-kb}k -exec ls -lh {} \; | awk ‘{ print $9 “: ” $5 }’ all you need is to specify the path and the size in kb (50000 for 50Mb for example.

Post author