Finding out what's using disk

ingo2 · Post by **ingo2** » 09 Feb 2012, 15:23

Now that the idle delay for the HD has been set to 5 minutes (from 8 seconds before) I am digging for the processes which keep the HD always up. A good starting point is

Code: Select all

lsof +D <directory>

Till now I have digged out 3 processes which are run every 5 minutes in '/etc/cron.d/':

Code: Select all

bubba-horde
bubba-notify
dovecot-timefix

They all write to syslog, so no wonder that the HD is kept busy all the time. Maybe I try to set those to every 10 minutes and check? Is that safe? Or, what about directing the output > /dev/null?

A similar thing happens with '/var/log/auth.log' which is written every 5 minutes as well.

Probably the best would be to put the whole directory '/var' on an USB-stick and mount it at boot time?

Kind regards,
Ingo

RandomUsername · Post by **RandomUsername** » 09 Feb 2012, 15:45

I don't understand what you're trying to achieve. My (limited) understanding is that the drive's firmware has been changed so you won't find something in the OS that's keeping the drive active.

Nrde · Post by **Nrde** » 10 Feb 2012, 09:33

Related to Ingos findings. There's a cron job that runs some dovecot service from time to time even though I don't use any of the email services of Bubba. Should they still run? Seems like a waste of resources.

ingo2 · Post by **ingo2** » 11 Feb 2012, 07:07

So, I'm a bit further right now:

1. Since the upgrade to 2.4 the firmware of the WD-HD was set to an idle delay of 5 minutes (from 8 sec. previously). This results in no loger parking the heads anymore and power consumption continuosly at 9.8 watts.

2. First step here was to set the timing of the mentioned 3 cron jobs from 5 min. to 10 min. by editing the 3 cron-tab's (bubba-horde, bubba-notify, dovecot-timefix). The result is now that every 10 minutes the HD gets active drawing 9.8 watts. After 5 minutes the HD enters idle, parks its heads and the whole box reduces power consumption to 7.8 watts - a saving of 2 watts (at least 50% of the time)!

3. I now checked what those 3 jobs are writing to the disk. First I checked logging and found that they flood the logfiles in /var/log. "syslog" gets 3 lines per 5 min., "auth.log" gets 6 lines per 5 min. This, besides keeping the disk busy, is unusual and makes it difficult th examine/search the logfiles for really important entries. First I tried to direct all output from those 3 scripts to /dev/null, but amaisingly without success (I don't understand why). So the next step was to filter out those useless messages and the place to do that is /etc/rsyslog.conf.

Inserting the following lines as the first in the "rules" paragraph as shown here:

Code: Select all

###############
#### RULES ####
###############

#
# First some standard log files.  Log by facility.
#
# Added by Ingo to avoid flooding of syslog and auth.log, 10. Feb. 2012
# Discard the bubba cron logging every 5 minutes to keep syslog clean
if $syslogfacility-text == 'cron' and $msg contains 'alarms.php'    then ~
if $syslogfacility-text == 'cron' and $msg contains 'dovecot'       then ~
if $syslogfacility-text == 'cron' and $msg contains 'web-admin'     then ~

# Discard the bubba cron logging every 5 minutes to keep auth.log clean
if $syslogfacility-text == 'authpriv' and $msg contains 'pam_unix(cron:session)' then ~#
auth,authpriv.*			/var/log/auth.log
......

To activate the new configuration you just execute:

Code: Select all

init.d/rsyslog restart

Important:: 'rsyslog reload' does NOT activate the new configuration, don't know why.

So this is my current status. The 3 scrips that keep the disk busy have been identified. Flooding of the logfiles has been solved, the 5 min. idle delay of the WD Caviar Green have been confirmed. The maximum LCC rate which can now (run scrips every 10 min.) be accumulated is 6 per hour = 50,000 per year - which is safe and within specs for 6 years.

The expected power saving is not yet what I did expect, I have to examine the 3 sripts and what else they do:

Code: Select all

/usr/share/horde3/scripts/alarms.php
/usr/lib/web-admin/notify-dispatcher.pl
/etc/init.d/dovecot status >/dev/null 2>&1 || /etc/init.d/dovecot restart

Especially whether it is necessary to check every 5 minutes whether dovecot has crashed.
That this is done despite I have all mail functionality disabled - should not happen by design.

With kind regards,
Ingo

ingo2 · Post by **ingo2** » 12 Feb 2012, 08:14

UPDATE:

I have now got such far that my HD logs only 58 load cycles and 35 start/stop cycles per day!
Most of the time it spends with HD spun down, drawing only 5.6 watts! (7.9 watts while idle with heads parked and 9.8 watts when HD active). Already with these settings the specification limits of the HD will be reached only after 10 years continous operation.

For this I had to disable the scripts which are run every 5 minutes. Appearently this only affects the PIM-notification of horde groupware and whatever "dovecot-timefix" means - did not see any impact so far (but don't have e-mail configured here).

I tried to set the "spin down delay" of the HD by hdparm, but appearently this does not work on my WD5000AADS. Instead this value seems to be set to fixed 5 minutes after parking heads internally.

I am right now optimizing the remaining cron jobs to further reduce wake-up's, but that needs some time to obtain results. If you are interested, stay tuned. As soon as fully tested I will publish here the recipie how to do the modifications.

With kind regards,
Ingo

EDIT: what is amaizing: if you shutdown via the rear button LED goes off, but power consumption here is still 6.1 watts (compared to 5.6 watts when HD is spun down) - any ideas???

ingo2 · Post by **ingo2** » 13 Feb 2012, 11:46

So, now I have done all to adjust scheduled jobs in a way to minimize disk spin-up. I only tuned the jobs which are run once or more a day (did not care about weekly and monthly jobs. As result of a 26 hour test, covering more then a full day I achieved (as reported by smartctl):

Code: Select all

12.02.2012 13:30h
	  4 Start_Stop_Count        0x0032   100   100   000    Old_age         54
	  9 Power_On_Hours          0x0032   100   100   000    Old_age         100
	193 Load_Cycle_Count        0x0032   200   200   000    Old_age          2065

13.02.2012 15:30h
	  4 Start_Stop_Count        0x0032   100   100   000    Old_age          80
	  9 Power_On_Hours          0x0032   100   100   000    Old_age          126
	193 Load_Cycle_Count        0x0032   200   200   000    Old_age          2104

This means, asuming specification of 300.000 LCC's (load cycle count) and 100.000 SSC (start/stop couns):
26 SSC's/day -> (spec. = 100.000) -> 10 years of operation
39 LCC's/day -> (spec. = 300.000) -> 21 years of operation

Power consumtion most of the time with disk sleeping is only 5.6 watts, compated to 9.8 watts when always active (this is the case without my modifications). The box also rund noticibly cooler!

And now, what you have to do:
In bold letters I give the filename you have to edit (use nano). Below in the code-field I show both, the original line commented with a hash-mark (#) and below my substitute with modified values. Below the code-field I give a short explanation in italic text.

*** /etc/crontab ***

Code: Select all

#17 *   * * *   root    cd / && run-parts --report /etc/cron.hourly
# Disabled by Ingo because there are no jobs in /etc/cron.hourly

This line is obsolete because there are no hourly tasks. Avoids a hourly log entry

*** /etc/cron.d/bubba-horde ***

Code: Select all

# Horde Alarms
# */5 * * * *   root    test -x /usr/bin/php && /usr/bin/php /usr/share/horde3/scripts/alarms.php
# Disabled by Ingo

# Temp Cleanup
# 0 23 * * *    root    test -x /usr/share/horde3/scripts/temp-cleanup.cron && /usr/share/horde3/scripts/temp-cleanup.cron
23 6 * * *      root    test -x /usr/share/horde3/scripts/temp-cleanup.cron && /usr/share/horde3/scripts/temp-cleanup.cron
# Changed by Ingo to adopt schedule to cron.daily

# Kronolith reminders
# 0 2 * * *     root    test -x /usr/bin/php && /usr/bin/php -q /usr/share/horde3/kronolith/scripts/reminders.php > /dev/null 2>&1
20 6 * * *       root    test -x /usr/bin/php && /usr/bin/php -q /usr/share/horde3/kronolith/scripts/reminders.php > /dev/null 2>&1
# Changed by Ingo to adopt schedule to cron.daily

Horde alarms are used for notification of PIM every 5 minutes, they won't work anymore, the other jobs are just shifted in time

*** /etc/cron.d/bubba-notify ***

Code: Select all

# */5 * * * * root test -x /usr/lib/web-admin/notify-dispatcher.pl && /usr/lib/web-admin/notify-dispatcher.pl
# disabled by Ingo

Notification in the Web-interface also scheduled every 5 minutes, this won't work anymore

*** /etc/cron.d/dovecot-timefix ***

Code: Select all

# */5 * * * * root /etc/init.d/dovecot status >/dev/null 2>&1 || /etc/init.d/dovecot restart
# disabled by Ingo

Checks every 5 minutes if dovecot is still running. In my personal opinion a dirty hack to solve the dovecot "Time moved backwards error", see here: http://wiki.dovecot.org/TimeMovedBackwards. This however should be no longer necessary as the B3 has 'ntp' package installed and configured to synchronise time with a high precision server via internet - that works great, check with "ntptime" as root.

*** /etc/cron.d/php5 ***

Code: Select all

# 09,39 *     * * *     root   [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete
18 6     * * *     root   [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete
# Changed by Ingo to adopt schedule to cron.daily

This is just housekeeping to remove old php-session cookies. Twice an hour may be necessary on highly loaded web-servers. If you don't collect thousands of cookies per hour, clean-up once a day is more than enough.

*** /etc/cron.d/mdadm ***
Does not need any modification, only runs on 1st Sunday of each month.

So, only these 5 files containg crontab's have been slightly modified. PIM alarms and Web-admin notifications have been sacrificed - they are the only trade-off's. Dovecot should no longer need the dirty hack, because we have ntp-server running by default.

I would be happy to receive your comments with these modificatons applied or, if you observe any other oddities/side effects which hopefull don't exist. Just one hint in case you want to check disk state while logged in via ssh and root

Code: Select all

hdparm -C /dev/sda

tells you the status. I just used a good power meter.

Happy energy saving,
Ingo

P.S.: You may just use copy + paste to get the modifications entered into nano, that's how the code-fields were filled by me. Avoids typos

ingo2 · Post by **ingo2** » 13 Feb 2012, 12:23

I just discovered another hourly write to the disk:

Code: Select all

/var/lib/ntp/ntp.drift

This is the correction for the system clock from the ntp-server (I mentioned above for dovect time sensitivity).

I'll search for a solution, that should reduce LCC's and SSC's by another 11 or 12 per day. Have 'ntp' also on my PC with Squeeze, so easy to play with.

Regards,
Ingo

ingo2 · Post by **ingo2** » 14 Feb 2012, 10:54

ingo2 wrote:I just discovered another hourly write to the disk:
Code: Select all
/var/lib/ntp/ntp.drift

I found a solution here as well:
You can disable this 'driftfile' writings by modifying the configuration:
*** /etc/ntp.conf ***

Code: Select all

# driftfile /var/lib/ntp/ntp.drift

by commenting the one line (putting a # at the beginning)
and restarting ntp:

Code: Select all

/etc/init.d/ntp restart

That's all.
Running that I now get:
7 SSC's/day -> (spec. = 100.000) -> 39 years
21 LCC's/day -> (spec. = 300.000) -> 39 years
(this calculates for roughly 2 hours activity with 9.8 watts and 22 hours sleep at 5.6 watts per day)

You really do not trade off anything important by this. The only difference is upon start-up of npd (i.e. after reboot/power-on) ntp has forgotten the correction it has used beforehand (some 100ppm on my box) and starts again at zero drift. This takes an additional 30 minutes to bring the time offset down to some 10 ms (milliseconds). IMO this is already overkill for the intended use of the box.

But those of you. who want to impress their friends with 1 us (micoseconds) precision will have to wait 1-3 days anyhow. They may put the "driftfile" on an USB-stick permanently attached to the box and adopt /etc/ntp.conf accordingly. Additionally you should choose timeservers geographically close to your location and care for a stable internet connection. Precision is limited by transmission delays of the internet connection and also stability of your hardware clock.

Best regards,
Ingo

ingo2 · Post by **ingo2** » 15 Feb 2012, 13:01

Some additional info regarding the "dovecot-timefix" issue in /etc/cron.d/:

as already said earlier: for me it's a dirty hack and obsolete with ntp beeing installed and configured. I leave it disabled because I do not use dovecot.
Those of you who are concerned that ntp runs out of synchronisation may add 2 lines in the servers section of /etc/ntp.conf. This will make ntpd to use the B3's hardware clock, if no timeserver can be reached at all. The paragraph then looks like this:

Code: Select all

server ntp1.sda.t-online.de prefer
server 0.excito.pool.ntp.org iburst
server 1.excito.pool.ntp.org iburst
server 2.excito.pool.ntp.org iburst
server 3.excito.pool.ntp.org iburst
server  127.127.1.0
fudge   127.127.1.0 stratum 10

There is another oddity actually not with dovecot, but the web-interface for it:

1. You are able to switch-off email in the web-interface and it works fine. Appears to issue following commands:

Code: Select all

update-rc.d -f dovecot remove
/etc/init.d/dovecot stop

This is absolutely correct: it removes the autostart entries for the init.d-script to avoid starting at next reboot and stops the daemon - great!

2. The web-interface now is not able to start e-mail service dovecot again (just in case you want to activate it again). This should be done with following commands:

Code: Select all

update-rc.d dovecot defaults
/etc/init.d/dovecot start

Doing so manually works perfect, but via web-interface I even do not get the "autostart-links" set again, neither dovecot starts.

And the installed init.d-script is perfect, it even checks for installed ntpd and then issues

Code: Select all

ntp-wait

before calling dovecot to make sure ntp is up and wait until it's synchronized.

So finally: the "dovecot-timefix" is a hack to hide the inability of the web-interface to properly start-up dovecot.

With kind regards,
Ingo

P.S.: I really would appreciate if there were some bug-tracker available to repoort such things to the Excito-team.

ingo2 · Post by **ingo2** » 17 Feb 2012, 15:42

ingo2 wrote: There is another oddity actually not with dovecot, but the web-interface for it:

1. You are able to switch-off email in the web-interface and it works fine. Appears to issue following commands:
Code: Select all
update-rc.d -f dovecot remove
/etc/init.d/dovecot stop
This is absolutely correct: it removes the autostart entries for the init.d-script to avoid starting at next reboot and stops the daemon - great!

2. The web-interface now is not able to start e-mail service dovecot again (just in case you want to activate it again). This should be done with following commands:
Code: Select all
update-rc.d dovecot defaults
/etc/init.d/dovecot start

I have to correct me for this:

It is still possible to configure "dovecot" via the admin-interface in the browser!
What I did erranously consider ist that the menue "Mail send/receive" is hirarchical for the 2 items below named "IMAP" and "POP3". I now found that only the switch named "IMAP" controls dovecot - and it does correctly - sorry. It starts/stops the service "dovecot" and it sets/deletes the autostart-links in /etc/rc?.d/ correctly.

So, in short: dovecot is controlled completely with the "IMAP"-switch!

What else I did now (besides disabling the dovecot-timefix as described above) is to disable the check whether dovecot does not run despite it schould have been startet, in the file (put "//" in front of the commands):
/usr/share/web-admin/admin/controllers/settings.php

Code: Select all

if(query_service("dovecot") && !service_running("dovecot")) {
	// restart dovecot
	//	d_print_r("Restart dovecot\n");
	//	start_service("dovecot");
}

The reason is that if you just started up your B3 and dovecot had been enabled before, it will be started at boot time. However the startscript, before starting dovecot, it issues 'ntp-wait' which waits for ntp-service to get locked. Worst case delay here is 10 minutes, so dovecot may not yet be running and the script issues the start-command a second time - causing probably trouble.
BTW: the very could happen with the "dovecot-timefix" which is run every 5 minutes when dovecot has died. "ntp-wait" is running with its default time-out of 10 minutes.

So, that's all, you can use email and even IMAP services also with the "dovecot-timefix" disabled.

Cheers,
Ingo

P.S.: the few HD spin-up's and LCC's I now get per day are mainly caused by avahi-multicasts when a new PC or VM ist booted and publishes its services in the local net.

ingo2 · Post by **ingo2** » 20 Feb 2012, 11:26

So, one last posting regarding this topic:

I now have located the source which is spamming in my local net and waking-up the B3 (*) and all is perfect with the above described modifications. When the device is not used/accessed by any PC I only get:
1 Start/Stop Count per day
2 Load Cycle Counts per day
which means the disk is in "suspend" during 95% of the time with B3 using only 5.6 watts. The disk only spins up once a day to preform all the daily cron-jobs which takes some 15 minutes alltogether with the above configuration.

Kind regards,
Ingo

(*) Just a short information on the network spamming:
it happens when you run a guest in a VM (i.e. VirtualBox) with networking configured as NAT and the guest system also runs an avahi-daemon - here observed with Linux guest on Linux host. In this configuration the guest shows up in the local network under the same IP address as the host - and 2 avahi-daemons on the same IP, you can imagine what happens. The easiest solution is to configure the networking for the guest as "bridged networking". Then host and guest get different IP's and all is fine and the network is clean.

Ubi · Post by **Ubi** » 20 Feb 2012, 16:40

Impressive hack, and a substantial gain of functionality. Well done!

Gordon · Post by **Gordon** » 22 Feb 2012, 03:36

Seconded

If you're worried (or anyone else is) about time having shifted after a reboot, you could consider writing the system time (which is controlled by ntp) to the hardware clock.

Code: Select all

~# hwclock -uw

One place you could do that is hack into the ntp script and add this to the stop routine.

ingo2 · Post by **ingo2** » 03 Apr 2012, 07:04

Just an additional information for those not so familiar with "hdparm":

In all tests above I have set the spin-down delay manually by executing something like

Code: Select all

hdparm -S <number> /dev/sda

However the WD HD appearently does not care for the <number> you specify for the idöle-delay and instead sets it to 5 minutes fixed (the normal meaning of this parameter can be found with 'man hdparm').

What is important however:
the disk "forgets" the setting to spin down after a power cycle. To activate it on every power-on you have to append following lines at the end of
/etc/hdparm.conf

Code: Select all

# activate spin-down of the HD
# value of 24 is not honoured by WD drives, normally means 24*5sec. = 2 minutes
/dev/sda {
  spindown_time = 24
}

Best regards,
Ingo

ingo2 · Post by **ingo2** » 07 Apr 2012, 08:06

Here another useful hint, how to find out which files are accessed on the disk preventing spindown or causing spin-up.

Use a great kernel feature: inotify!
Just install 'inotify-tools' and start the monitoring in background for a suspicious directory and all files it contains with this command:

Code: Select all

inotifywait -mrq --format "%T %e %w%f" --timefmt %R /var/lib/smartmontools/ >> <path to directory on USB-stick>/inotify.out &

Don't forget to kill the background process later on before you start another instance or if you are done!

This example, monitoring /var/lib/smartmontools I used on my nas to trace the smartd actions waking up my HD from time to time. After placing the files in that directory on an USB-stick (and configuring the path in /etc/default/smartmontools with options "-A" and "-s") my disk stays spun down.

Another candidate to check this way is the directory /var/log/ to see which logfiles are accessed on the B3, in case you have enabled some features which prevent HD from sleeping.

Happy Easter,
Ingo

P.S.: probably also useful for Excito to substitute some of their backend monitoring functions by inotify?

forum.excito.com

Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk

Re: Finding out what's using disk