Bingo

Saturday, August 16, 2008

Linux: Share files between two or more users

We need to maintain the situation where all files in a given directory satisfy these requirements:
1. Owned by a common group , say "team".
2. Permissions - 770 or thereabouts.

I have faced this requirement many times, and many others have asked questions about this. So I decided to create a script for this.

The following script fixes the owner of all files in the given directory to root. Group owner is forced to be "team", and file permissions to be 770. It uses inotify to fix a file as soon as it is created, or its permissions are modified. You need to install inotify-tools package of your distribution. Tested with Fedora 9 linux, and eeebuntu on a eeepc.

If you save this as filepermd.sh, you have to call it as

sudo nice filepermd.sh /path/to/shared/directory/ team


#!/bin/bash

#remove trailing slash

dirRoot=`echo $1 | sed 's/\(.*\)\/$/\1/'`

echo $dirRoot

groupname=$2

chown root:$groupname "$dirRoot"

inotifywait -c -e ATTRIB -e CREATE -r -m "$dirRoot" | while read line

do

dirname=`echo $line | awk -F"," '{print $1}'`

filename=`echo $line | awk -F"," '{print $NF}'`

filepath="${dirname}${filename}"

perm=`ls -ld "$filepath" | awk '{print $1}' | sed 's/.\(.*\)/\1/'`

#remove first letter from permissions string. This is d for a

#directory, and - for a file. Not useful for us.

owner=`stat -c '%U' "$filepath"`

groupowner=`stat -c '%G' "$filepath"`

if [ "rwxrwx---" != "$perm" ]; then

chmod 770 "$filepath"

fi

if [ "root" != "$owner" ]; then

chown root "$filepath"

fi

if [ "$groupname" != "$groupowner" ]; then

chgrp "$groupname" "$filepath"

fi

done


Known Limitations: Following characters in filename, or directory path not supported. Such files will be silently ignored.
  1. Comma (,)
  2. BackSlash (\)
  3. Double-quote (")
  4. Single-quote (')

This is somewhat crude currently, but it will be modified as I get more feedback.

Sunday, August 10, 2008

Desktop linux performance comparison:32 bit vs 64 bit

It has been a long time since processors supporting 64 bit operating systems became commonplace. But desktop users have still not migrated en-masse to 64 bit operating systems, and with good reason:


1. Performance benefits were not compelling enough to trigger the move. The few reviews that have been conducted and publicly available, 64 bit did not come as unquestionably better performing.

2. Distributions were saddled with compatibility issues. Flash, java, mplayer codecs were some items that needed workarounds to make them work in 64 bit. Most of these workarounds were to actually use a 32 bit executable running inside a 64 bit operating system. There were always alternatives, like IcedTea for java and Gnash for flash.

3. Most users had RAM well within the 4GiB limit for 32 bit. Moreover, 64 bit is not the only way to use more than 4GiB RAM anyway. There is always PAE, though with an overhead.You can use 64GiB RAM with PAE which is enough not only for most desktop users, but also for many servers.


Surprisingly, I did not find the internet infested with comparisons between 32 bit and 64 bit. Phoronix did a review about 20 months ago, which can easily be termed obsolete.In that review, 32 bit was the winner in performance in most applications that mattered (to me, anyway). Add that to the compatibility issues encountered in the 64 bit flavour of most distributions, and 64 bit is just not justified. The categorical defeat of 64 bit in the last battle makes it all the more important that more battles be fought and users get to know the results so that they can make an informed decision as to whether to opt to x86_64 or x86 flavour of their favourite distribution.


Test Procedure:

Various programs were run on both Fedora 9 Live USB x86, and Fedora 9 Live USB x86_64. The time taken, and CPU cycles consumed by the programs were measured using the time command. The "real" time was measured in the majority of cases, unless an exception is noted in the detailed description below. In all the cases, difference between time readings for consecutive runs of the same test varied not more than 2%. The tests were run 3 times and the median readings were considered. After each run, the filesystem on which the tests were run was unmounted and mounted again so that filesystem caches are cleared and subsequent runs do not get an unfair advantage. The activities that were chosen for testing are such that I/O is not an overwhelming part of performing them.

Why Fedora?
Well, I use fedora anyway. When released, Fedora generally includes cutting edge versions of software but now it is about 5 months since it was released. So Fedora 9 can be considered of intermediate cutting-edgedness at the time of testing.

Hardware
It is a 18 months old self-assembled PC that I consider eligible for this testing because of being close to the average newness of PCs. Newer systems might take better advantage of 64 bit OS. You can judge for yourself whether the review is relevant to you by reading the below specifications:

Motherboard: ASUS P5B-VM, Intel 965G chipset
Graphics: Intel Integrated X3000
Processor: Intel Core 2 Duo E6300, 1.86 GHz.
RAM: Kingston 667 MHz, 2 sticks of 1GB each
Hard disk: Seagate 160GB SATA
No component has been overclocked.

Software
A detailed list of all software installed in a default Fedora 9 live CD can be obtained from Fedora home page. I am listing the chief ingredients, and the additional software I used for testing.

Kernel: 2.6.25-14.fc9.i686
bc: bc-1.06-33.fc9
mencoder: livna's build,mencoder-1.0-0.94.20080531svn.lvn9
lame: livna's build,lame-3.97-6.lvn8
gpg: gnupg2-2.0.9-1.fc9



The bc function for factorial has been picked up from the man page of bc. Example code for factorial 20,000 is given below.

define f (x) {
if (x <= 1) return (1);
return (f(x-1) * x);
}
f(20000)
A sample command to use the above bc code is given below. The output is redirected to null device because printing the huge factorial would be a mammoth exercise in I/O which is outside the scope of this review.

time bc factorial20k.bc > /dev/null

This is not to say that people love to calculate factorials of large numbers in their spare time, but the wikipedia article about 64 bit architecture mentions that factorial can be calculated approximately twice as fast in 64 bit as compared to 32 bit. This makes it an important metric about 64 bit operating system performance. You will note that we are far from realizing this potential of 2 times gain by 64 bit, but we get around 10% gain in all these factorial tests which is significant.





A 700MB video was encoded using mencoder with the following options. Livna's build of mplayer was used.

time mencoder $videoIn -oac lavc -ovc lavc -lavcopts \
abitrate=112:vbitrate=736 -o $videoOut > /dev/null 2>&1
Gain is about 10% with 64 bit.



A 700MB file was encrypted using default gpg algorithm, with a 4096 bit key.

time gpg --encrypt --recipient 'my name' $inputFile
The same file was decrypted again. The time considered here is 'user' time + 'system' time as reported by the time command. This is because I had to type the password each time and a comparison of my typing speed under 32 bit vs 64 bit Fedora is outside the scope of this review. So the 'real' time is irrelevant for our purposes.


time gpg --output $outputFile --decrypt $inputFile
Gain with 64 bit is about 17% for encrypt, and an awesome 32% with for decrypt.



A 100 MB wav was encoded to mp3 using default settings of lame.

time lame testIn.wav testOut.mp3

Gain with 64 bit was noted to be about 5%.

Coming soon: multi-tasking performance of 64 bit vs 32 bit

Multi Tasking
Now we know that factorial computation is 9.8% faster for 64 bit; and video encoding is 10.6% faster. What happens when we run many CPU intensive processes together? All processes have to do their own thing, but in addition to this, the kernel has to switch between processes. This switching has an overhead, and it might be helpful if we know whether this overhead is any lower on x86_64 as compared to x86. Admittedly the overhead is very small to measure in a small time experiment, but here is an attempt.

I started 5 processes of factorial computation, and 5 processes of video encoding, all in parallel. I let them run for 20 minutes, and calculated how many times they finish their task in the given 20 minutes. Running this test again yielded identical results, I didn't try a third time. Results:
x86:
Calculated factorial of 20,000 85 times
Encoded a 5 minute sample of the above video 45 times

x86_64:
Calculated factorial of 20,000 95 times
Encoded a 5 minute sample of the above video 50 times

Turns out, the context switching overhead is nearly identical in x86 and x86_64. The difference observed is within experimental error boundaries. Calculations?

Factorial: Ratio of time taken x86/x86_64 = 13.4/12.1 = 1.11
Expected ratio of iterations in a fixed time: 12.1/13.4 = 0.90
Observed ratio = 85/95 = 0.89
Encoding: Ratio of time taken x86/x86_64 = 596.18/533.22 = 1.12
Expected ratio of iterations in a fixed time = 533.22/596.18 = 0.89
Observed ratio = 45/50 = 0.90

For factorial computation x86_64 had a slightly lower overhead, but it had slightly larger overhead for video encoding. Cannot conclude any great difference from these results. Admittedly, the multi-tasking test is the least convincing part of this review and there are surely other ways to test this better. Seems like the better methods involve more work,, but suggestions of all kinds are welcome.


Conclusion
It can be safely concluded that performance gain by 64 bit OS has become significant for even a casual user to notice. It does not change the world, and most activities are anyway not processor bound these days. Support for 64 bit features must have improved significantly since the phoronix review where 64 bit took a beating from 32 bit.

Note that if the gain is not up to the potential of x86_64 architecture, this could be the fault of application(including libraries), compiler or the kernel. The activities tested are not a typical workload of an average user, but an attempt has been made to test the typical CPU intensive activities that a user is likely to expect from his computer.

PS
After many years, disabling user comments. There is a lot of spam, and blogspot doesn't make it easy to manage user comments.