Posts Tagged ‘execution’

Pwned by error message

August 17th, 2009 6 comments

In the life of an IT-guy you often shake your head and smile while wondering about the current problem or its solution. So after a while not much can realy surprise you. But last week there was such a rare moment.

The problem is an easy one: we could not execute binaries. We have a cluster here with an 32 bit head node and 64 bit compute nodes both x86 Debian GNU Linux. The stuff to be shared comes over a NFS. Heterogenouse clusters have some difficulties because you cannot execute binaries from one architecture on another. This will fail with the error message: “wrong ELF format”.
Our problem first appeared by using the Modules-tool. For testing we executed it directly:

-bash: ./modulecmd: no such file or directory

But OK, as everybody could see, it was. We checked and changed the rights for users, groups, owndership, parent directories etc. Nothing helped. Next we put an eye on NFS. No miss-configuration or other distinctive feature.
Now we use the bigger weapons and strace the command.

strace ./modulecmd
execve("./modulecmd", ["./modulecmd"], [/* 25 vars */]) = -1 ENOENT (No such file or directory)
dup(2)                                  = 3
fcntl(3, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff4b4700000
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
write(3, "strace: exec: No such file or dir"..., 40strace: exec: No such file or directory
) = 40
close(3)                                = 0
munmap(0x7ff4b4700000, 4096)            = 0
exit_group(1)                           = ?

Also nothing surprising in here.
The further isolate the root of the behaviour, we code a short “hello world” in C and do compiling and execution and it worked. So here we got the main point which confused us. Self created binaries and all system tools worked properly, just the shared ones or let’s say “not system contained” executables failed.
OK, maybe it’s situated in a deeper level. We installed dash and try to execute it in there. No changes. Slowly but steady we ran out of ideas.
From the strace run we could see that execve backfired. This command overwrites the current forked process with the binary given by a path as an argument and executes it. So maybe the reason belongs to the execution context. We tried


instead. So WTF? After ignoring some charset ideas in mind, we arrived where we started. An obligatory file on a system executable and our binary shared over NFS told us the wrong ELF type. The installation of ia32-libs fixed the problem finally, but why there was a “no such file of directory” instead of a “wrong ELF type” we still do not know.

You are welcome to give us solution or hints.

Categories: bashism, linux Tags: , ,