posted on January 4, 2001 02:34:28 AM
From the Announcements Board ....
-----------------------------------------
*** Letter from Meg ***
Dear eBay Community:
Yesterday, the eBay site was unavailable to our users for almost 11 hours. This outage was the result of a series of failures that affected both our primary and backup systems. It was compounded by a decision we made to delay the replacement of certain hardware components in an effort to avoid disrupting service during the busy holiday season. While we have known for awhile that a potential problem existed in our shared disk hardware, we chose to delay the recommended upgrade because we had developed a series of work-arounds that had previously proven effective. We apologize to you all for this lengthy interruption of service.
The outage began at about 11:34 PT yesterday with a hardware failure on our backup system. As we restarted the back-up system, another problem developed in the storage system shared by the primary and back-up systems, effectively bringing down the site.
We then brought up our third backup system which functioned effectively for 40 minutes beginning at about 14:45 PT before a database problem caused this system to also become unavailable again at about 15:22 PT. The site was fully restored and operational as of 21:56 PT.
In accordance with our Extension Policy, all auctions scheduled to end on Wednesday, January 3, 2001, between 11:34 PT and 23:59 PT have been extended by 24 hours. In accordance with our Extension Policy, all associated fees for auctions scheduled to end during this period will be refunded. This includes Insertion Fees, Optional Insertion Fees (such as Featured and Category Featured), and Final Value Fees.
Since our last major outage in August 1999, we have invested considerably in our technology infrastructure. For the last four quarters, we have demonstrated greater than 99% uptime. During this period, the backup systems have performed without incident.
We are taking steps to provide the level of service our members expect from eBay. First, we will upgrade the hardware components recommended by our vendor in the next few weeks, which will require scheduled downtime of approximately six hours during a low-traffic time. Second, we have already embarked on a longer-term program to distribute the database to many separate servers that will isolate any failure to a limited part of the site. We expect this to be complete within four months.
We have made some significant improvements to the site over the past year to improve site stability and scalability, and we still have some additional work to stay ahead of our growth curve. We remain committed to these efforts. We apologize for this outage. And we thank you for your support of eBay through this.
Sincerely,
Meg
-------------------------------------------
Not to be picky, but they knew there were problems, and that previous "work-arounds" had fixed them before. Hmmm .... Comments?
posted on January 4, 2001 03:00:21 AM
we have already embarked on a longer-term program to distribute the database to many separate servers that will isolate any failure to a limited part of the site. We expect this to be complete within four months.
This means they will never have to admit to a 2 hour outage again because only part of the system will be down.
posted on January 4, 2001 04:07:16 AM
I think what Meg is trying to say is this:
They had a bad hard drive and ignored replacing it so that they could take their holiday vacations.
(“...compounded by a decision we made to delay the replacement of certain hardware components in an effort to avoid disrupting service.”)
They couldn’t find the floppy they had everything backed up on.
(”...another problem developed in the storage system shared by the primary and back-up systems, effectively bringing down the site.”
The extra backup floppy in her purse was damaged too.
(”We then brought up our third backup system which functioned effectively for 40 minutes.”)