IBM support (sic) story

George Robert Boyce grboyce at rodan.acs.syr.edu
Thu Apr 18 05:54:25 AEST 1991


In trying to add a 3rd party scsi disk to my RS6000/530 server (BTW,
why does IBM support make us call it a 7013?), I ran into two small
problems.

The first was that one of the commands, I forget which, forks a copy
of "mkboot" and I had my own copy of such a program which was found in
my path ahead of /etc/mkboot. My program of the same name, needless to
say, didn't do the expected thing and the command seemed to hang.
Before I knew this cause of the problem, I had decided that I needed
IBM software support since their procedure to add a 3rd party scsi
disk seemed to be failing. I was eager to test out IBM's support, and
IBM support for 3rd party hardware.

That was on Friday morning and I wanted to get this resolved before
the weekend. But since I had followed comp.unix.aix and had called IBM
software support directly in the past, I knew the procedure was to
call my local SE first. I could have told him over the phone that
"lcreatevg" was failing, and I could have read or faxed him the error
message. But he insisted on coming out to help, on Monday. Fine...

So on Monday my SE arrives, we start from scratch and after two or
three commands we run into the problem, he records the error message
and agrees we should call software support. Level one support wasn't
of much help but they did suggest that we reboot the system and see if
that helped. It *seemed* like a reasonable suggestion so that is what
we did. Enter problem number two...

It seems that something (maybe me) had trashed the boot block, err
boot logical volume, of the system disk and the system would not
reboot. This was obvious to me and it seemed to me that there should
be a software solution to this new, more serious, problem. But level
one software support, and my SE, said we had a hardware problem. This
was despite the fact that I could boot the maintenance disks and mount
the system disk and play around without any problems.

Ok, so now I get to call hardware support, report the problem, and
they dispatch a local HW engineer to deal with the problem. A few
hours later, he shows up and we try to run the HW diagnostics. I
offered to run them hours earlier, but my SE seemed to insist that we
let the HW guy do it. His first question when he arrived was, "So, you
run diagnostics yet?". Sigh. Well, the diags run just fine (as I
expected) and so he now calls level one hardware support. We all guess
their answer and sure enough, they say to reload the system. We say,
that is unacceptable and the call gets bumped up to level two hardware
support.

We play around a couple more hours, including trying to boot
diagnostics from the internal disk. We get the same errors as from
when we try to boot AIX, which seems to confirm, to me, that the boot
logical volume is messed up. It seems to confirm to level two hardware
support that we need to reload the system.

After insisting that reloading the system was not a valid option, and
hardware support insisting that there was no hardware problem, we get
the call transfered to level two software support. Once connected, we
got the magic commands needed to fix the problem. A third problem came
up; I was using an old set of maintenance disks and the instructions
didn't work. The level two support person was able to recognize my
error, and correct it and the whole procedure took 15 minutes. 

15 minutes is a damn good time for any support call and I was very
happy. But I am still wondering how to cut down on the *six hours* it
takes to get to the right support person.

On 4/9/91, Pierre Asselin wrote

> General conclusions from earlier exercises:
> 
>  o  Software Defect Support is officially limited to its narrow mandate.
>  o  Technical support is available for the RISC-6000's.
>     It's called comp.unix.aix.
>  o  Accurate information on the RISC-6000's is available, but only
>     on comp.unix.aix.
>  o  Accurate information on the IBM support structure is available,
>     but only on comp.unix.aix.
>  o  To this day, IBM is convinced that it's doing a fine job.
>  o  Hardware support does work.  Beats me.

I have to argue that level two software support knows their stuff. The
problem then is that IBM has a level one support system in place (a)
to protect the valuable and expensive resources of level two by (b)
answering the easy questions. I would argue that level one one does
half of their job. They do a hack of a job of protecting the level two
folks.

So that leaves us with comp.unix.aix for level one support, and a good
but well protected level two support. It could be worse. I think there
are also other possible solutions to this situation. We could try to
convince IBM that (a) they have a support problem and (b) that it is a
serious problem. That seems like it could be a lot of work and we
haven't even solved the problem yet, just made IBM recognize it.

My own oppinion is that IBM should subcontract level one support to
local and regional support service companies and provide all the
necessary support to make it work. But then I've just formed such a
company so my oppinion is biased. Regardless, I am calling my local
office right now to suggest it...

George
--
George R. Boyce, Manager, Systems Engineering Group, george at spica.npac.syr.edu
CASE: Computer Applications and Software Engineering Center
NPAC: Northeast Parallel Architectures Center
SCCS: Syracuse Center for Computational Science

And now also: The Computing Support Team



More information about the Comp.unix.aix mailing list