posted on December 3, 2004 05:00:00 PM
As one of the customers affected by the "intermittent (!BS!)" image hosting problems, I think that we deserve a full post mortem after the issues are resolved:
1. What happened
2. Why did it take so long to fix
3. What steps are being taken to make sure it doesn't happen again
posted on December 7, 2004 04:54:51 AM
I still expect a post mortem. It is more important to me than compensation. Vendio can't realistically compensate for the losses incurred without going belly up itself, I understand that. I do not understand, however, why we can't get a detailed post mortem of the image hosting problem, and last night's shorter but more extensive meltdown.
posted on December 7, 2004 06:24:03 AM
I agree postcardpalace! There were warning signs that vendio couldn't handle the listing volume last Tuesday (Nov 30 - see all the posts about site slowness) and then again on Dec 3 when images were down and last night was the crowning blow! My losses were not substantial compared to the poster in the lost images thread who had started auctions low to bring in lookers only to have the images disappear in the crticial final hours.
Vendio has to learn to listen to users and not give pat answers like "clear your cache, reboot, disable your firewall" and finally "it isn't us it must be your ISP". If lots of people are posting about the same problem then all signs point to Vendio problems and should be addressed immediately. Many people derive their living from ebay and a good percentage of that comes during the few days of the holiday season. There should be no outages and certainly not several in a week! If we wanted to take our chances with reliability we would use Spare Dollar where image hosting, listing and post sale management is a flat $4.95/month. We expect more of Vendio and are willing to pay the extra for it.
**********************************
"Life's journey is not to arrive at the grave safely in a well preserved body, but rather to skid in sideways, totally worn out, shouting "...holy sh@#...what a ride!"
posted on December 7, 2004 10:27:10 AM
Unfortunately, the conspiracy theorists out there will not be satisfied with our postmortem. As it turns out, the root causes are independent.
Any of you interested in knowing what is going on, read on.
Friday's outage, which affected 1/6th of our users, was a compound fault on a segment of our imagehosting system. We design systems like this to survive one or more faults. In this case, the segment affected suffered at least 3 faults [unrelated faults incidentally]. And the recovery was complicated by being near peak throughput during the selling season. Unfortunately, when we get to the point of multiple faults, the recovery time can be large.
Sure we can design for more than 3 fault resistance, but at what cost?
Monday's outage was with our primary Internet Service Provider (ISP). Their analysis was that they had an attack on several of their systems in southern California which propagated throughout their network. They are a large ISP and this outage affected other large internet services at the same time. While we have 2 other ISPs we contract with, having a failure at peak load under these conditions resulted in a situation where we couldn't keep up on the remaining services. This was the partial outage you may have seen.
Incidentally, we've known about this vulnerability of our network for a while -- and have been holding off on executing network improvements until we got through peak selling season. This was a calculated risk: that of creating network issues during peak as a side effect of improving the configuration vs. leaving it alone until a less loaded time hoping we wouldn't suffer an outage like we had last night.
We'll likely be scheduling some reconfigurations in the very near future [basically raising the priority of this change]. As with all changes, the goal is to make things better, there is risk we make it worse and have to revert to a previous configuration.
Do note: we read pretty much all of your postings and we do hear your complaints [and the occasional thanks too] and do take action all the time. Yeah, we have a large system, and a lot of customers and yes, just like everyone else, we aren't 100% perfect -- although we do strive for perfection
Keep the comments coming. I've been mulling over the idea of a forum on system architecture or technical discussion. Hard to say if this will make sense though.
posted on December 7, 2004 12:23:51 PM
THANK YOU! This makes much more sense to me when explained this way. Suggestion , Put a big warning up when you first experience problems. Some of us will back off during that time and maybe avoid a major meltdown.
Kevin
posted on December 8, 2004 01:26:55 PM
It's bad enough to start sales on Ebay with a new account. It is bad enough that the area of my listings on Ebay is loaded with fraud and a 0 FB shiny new ID isn't trusted.
What is extrememly annoying is in spite of all that the auction managing service I'm using makes me look like an idiot and a rank amateur. I am losing money over these technical issues and I don't like that.
Bottom line. Why should I pay this month's bill?
The technical discussion above outlines a 3 level redundancy system that failed. Implement one with four or five levels of redundancy. I'm not interested in hearing that it will cost this or that. I'm interested in rock solid performance and stability. Nothing else is important to me and THAT is what I'm willing to pay for.
Does it sound like I'm a bit angry over this Vendio? You bet. If it keeps up I will take my business elsewhere or host my own images and cut the code for my listings. Been there done that and it isn't a big deal....
posted on December 8, 2004 05:44:18 PM
gcmcnutt, thank you.
I think that one of the problems might be that Vendio has (at least) two classes of clients.
I see a lot of discussion about image hosting fees and an unexpected $30 charge. Those people appear to be running relatively low overhead operations, possibly as a part-time income enhancer. To ward off potential flames, there is nothing wrong with that level of business, and from Vendio's perspective, it might be the target demographic.
Then there are others, my shop for one, that employ a half dozen people, have considerable overhead, and are willing to pay more for stability and features. My shop pays roughly $100 - $150 a month for Vendio, but we would be willing to pay more for stability, a real UPS and USPS interface, additional download capabilities for linking to our internal systems, etc. Something between the low end and Infopedia. We don't want to pay more for the same service, but (as an example) improved shipping functionality would probably save us 10 person-hours per week, and that is $ -- not to mention that it might increase bidding if we didn't have to "high ball" shipping estimates to accommodate NJ - CA shipping in our quotes.
It is difficult to have one level of service that satisfies both camps. I also understand that "dialing in" different levels of capability for different classes of customer is perhaps not technically feasible or cost-effective. I have to sometimes wonder though, whether my shop is not in the "target customer base" for Vendio.
By the way, I'm going to assume that mentioning conspiracy theorists in a thread I began was not intended to throw me into such a group. I am generally very appreciative of what Vendio provides my company, and have said so publicly many times. My only previous major peeve has been the time required to deliver the long-promised UPS capabilities in Vendio.
Thank you for your explanation.
I appreciated the wording used. No complicated wizard suff.
BUT about the conspiracy theory, you have to agree: two major Vendio outages exactly when eBay ( and Vendio ) reach a new listings record nr., this is a BIG coincidence.