Author Topic: An attemt to explain what went on that Wed night (a follow-up on the FP issue)  (Read 76197 times)

0 Members and 1 Guest are viewing this topic.

Offline Vlk

  • Avast CEO
  • Serious Graphoman
  • *
  • Posts: 11658
  • Please don't send me IM's. Email only. Thx.
    • ALWIL Software
Hi,

I decided to explain in a bit more detail what happened during that Wednesday night when we released the bad definitions that started flagging thousands of innocent programs as Trojans.

Normally, we have two definition updates a day. Usually one in the morning, and one in the afternoon/evening (unless there's some emergency). The actual release process is well defined, and features multiple QA checks that ensure that the definitions we roll out don't cause any [major] problems. For example, every definitions that we push out have to pass a false positive (FP) test on our extensive cleansets. The cleansets currently contain terabytes of data from hundreds of thousands of applications (we run many tests in parallel but still the test takes at least an hour to complete). Every single FP on this test set is a reason for the definitions to go back to the virus lab and be revised (and after a fix is made, a new full cleanset test is performed, until all is fine).

Now, given what I've just described, how could it happen that we released definitions that produced so many FP's? Were we so unlucky so that none of the affected applications was included in the cleanset? (i.e. is the cleanset so poor?)

No. In fact, an analysis done later showed that with the definitions in question (VPS 091203-0), we detected over 50 thousand unique samples from the cleansets as viruses!

The problem was that the FP test was not performed at all before the definitions were pushed out.


On December 2, roughly 9pm we had a normal (scheduled) VPS update 091202-1. The update was working fine for most users, no FP's or anything. However, due to a bug in it, the update wasn't working correctly in some Avast v5.0 (beta) installations. On these computers, the avast service wouldn't start after a reboot. Remember that avast 5 is still in beta and bugs like this can (and do) occur.

Soon after releasing the 091202-1, we noticed the problems with v5 and after doing some analysis, a decision was made to release another update that would fix the problem. It was around 1am local time and the situation was a bit stressful because v5 users were experiencing the issue and something had to be done fast. One of the persons not normally responsible for releasing VPS updates (but equipped with the knowledge of how it's technically done) went ahead and released the out-of-band update. However, unfortunately, he didn't follow the prescribed process and used wrong input files to generate the VPS. Files that were just prepared for testing - but were never really tested. :(

Anyway, after the update was released (at around 12:30am GMT, i.e. 1:30 local time here in Prague) there still was a chance to get some early warnings that the update is a fiasco and needs to be rolled back immediately. The irony is that the person was checking for at least one more hour whether there's anything wrong, but the internal systems used to flag any anomalies (such as increased load on the FP reporting servers) weren't showing anything special at this time. Should he have checked the forum he'd certainly notice the buzz that just started happening here, but unfortunately, he didn't do so.

The responsible people were alerted not earlier than at 5:15am local time when the problem was already of massive size. It took 75 more minutes to release the cure.


What's the conclusion? We will certainly be improving the process further so that such a thing is not possible anymore. In fact, this is our first major issue of this type, so we feel that even the current process works well, but only if it's strictly followed. But we need to make sure that it is really enforced in every possible case.

Furthermore, we're thinking of some additional early warning systems. If for example the evangelists here on the forum had a phone number to call in case of emergency, the problem could have been contained much much faster and the harm done would be incomparably smaller. Automated alerting systems have their place, but in many cases, a human decision is the best. And better to be alerted falsely ten times than not alerted at all.

The overall process will also be completely revised, and crisis management plans defined. We plan to do this over the next week, and I'll be sharing the outcome of this with you.


Looking back, we feel really sorry for what happened. We have learned a lot from this incident and are making sure it will never, ever happen again.

So, if you believe in second chances, please stay with avast. We screwed and we know it but we have to look forward and keep fighting. The virus writers don't sleep.


Thanks
Vlk
« Last Edit: December 04, 2009, 07:43:10 PM by Vlk »
If at first you don't succeed, then skydiving's not for you.

Hermite15

  • Guest
OK VLK, thank you very much for taking the time to post this. I requested it in another thread and I'm glad you did it  ;) ... as I was also wondering why an update was released in the middle of the night, which isn't usual with avast, especially when an update was released just a few hours before. Now I see what happened...
 As far as I'm concerned, I consider such errors human, and I won't stigmatize Avast for this. So, np here, sticking to and with Avast  ;)
« Last Edit: December 04, 2009, 06:09:08 PM by Logos »

lindawing

  • Guest
Thank you, Vik! That was about what I figured, in that I knew there MUST be something strange that had happened somewhere in the processing, because that third update came through only a short time after my second update! The little notice came up, and I immediately remarked to my son, "Wow! Avast! NEVER updates three times a day...there must be something strange going on!" Then just as immediately, the popups began...YIKES!

Thankfully, I didn't delete anything, and (even though it didn't help things later) I was able to restore everything from the chest (about 10 items). I am now going to do a full uninstall and clean install, because I'm having internet browser problems when the Standard Shield is active. I have a feeling that will cure my final problem after the big snafu.

At any rate, I want to thank you all for being so quick to work this out, and I'm totally confident that any new system you put in place will be great. In all the years I have been using Avast!, I have never, ever had this type of problem before, and here at the forum, I've found it very easy to get questions answered and help quickly delivered. You and the team are very, very friendly and efficient. I would never leave you just because this happened. I put my trust in your product a long time ago, and I don't believe it was misplaced.

Thank you again.

twl845

  • Guest
Let yesterday stay in the past. You guys at Avast are the best. Especially the employee who learned from the experience and taught everyone else that even the best can make mistakes.  ;)

sunsets

  • Guest
Vlk,

Thank you for taking the time to explain what happened. I will continue to use Avast.

enddays

  • Guest
I am staying with Avast  ;)  We are all human and can make mistakes, but it takes a big man to say sorry Vlk

Offline RejZoR

  • Polymorphic Sheep
  • Serious Graphoman
  • *****
  • Posts: 9406
  • We are supersheep, resistance is futile!
    • RejZoR's Flock of Sheep
Thx Vlk. But i found your decision to remedy avast! 5 update problems a bit strange. avast! 5 is still in beta and every even major bug can be excusable. Also less users use it compared to stable 4.8.
Visit my webpage Angry Sheep Blog

pinnacle

  • Guest
vlk, I accept the the detailed information mistakes can happen.

Offline Vlk

  • Avast CEO
  • Serious Graphoman
  • *
  • Posts: 11658
  • Please don't send me IM's. Email only. Thx.
    • ALWIL Software
Thx Vlk. But i found your decision to remedy avast! 5 update problems a bit strange. avast! 5 is still in beta and every even major bug can be excusable. Also less users use it compared to stable 4.8.

With an update frequency of twice a day, a 3rd update seemed like a natural thing to do (an easy fix). And, of course, if it were executed correctly, there would be no problem.

We can speculate whether it was a right or wrong decision but I don't it really matters.
If at first you don't succeed, then skydiving's not for you.

lindawing

  • Guest
Might I just insert that Vlk didn't create the problem, nor has he laid the blame on anyone specific. He's simply stated what happened, and has apologized for it.

Vlk, would you please read this:

http://forum.avast.com/index.php?topic=51745.msg437873#msg437873

I still can't use any internet browser with the Standard Shield activated.

Thanks.

Offline MikeBCda

  • Avast Evangelist
  • Super Poster
  • ***
  • Posts: 2247
Hi and thanks, Vlk.

In one of the zillion threads relating to this (sorry can't find it easily, but you may have already seen it), there was an interesting suggestion for a preferably-automatic work-around, in effect permitting the user to "downgrade" back to the previous installed version of the database.  I agreed that it might be an idea for your crew to look into, although I agree the repair you did was admirably prompt.
Intel Atom D2700, 2 gig RAM, Win 7 x64 SP1 & IE-11, Firefox 51.0
(default). 320 gig HD, 15Mb DSL, Win firewall, Avast 12.3.2280 free, SpywareBlaster, MBAM Prem., Crypto-Prevent

Offline polonus

  • Avast Überevangelist
  • Probably Bot
  • *****
  • Posts: 33891
  • malware fighter
Hi MikeBCda,

Well that could be a good idea that avast could come up with a sort of system snapshot with a good functioning version of avast5 to go back to whenever an incident of this magnitude might affect us (hopefully never),

polonus

« Last Edit: December 04, 2009, 10:05:36 PM by polonus »
Cybersecurity is more of an attitude than anything else. Avast Evangelists.

Use NoScript, a limited user account and a virtual machine and be safe(r)!

Hermite15

  • Guest
good thing would be at least to generate a windows restore point (just) before an update is applied, tens of programs are doing that at setup time (sometimes initiated by Windows itself, sometimes by the programs), Windows Defender as well as MSE are doing it too when they get updated (also manually  ;) )...so why not avast ?

 the problem that remains  being if system files necessary for the restore to complete have been sent to Chest  ;D ...restore them first... yeah... :) that's a case per case situation, can't give here the universal solution.
« Last Edit: December 04, 2009, 10:24:22 PM by Logos »

John_E

  • Guest
Vlk - Thank you for taking the time to explain what happened. As a "regular" user this gives me peace of mind to know the details and realize the likelihood is small this will happen again anytime soon.

I'm not sure how many other companies would do this. Covering up mistakes seems much to frequent these days with all products and services.

John in STL

Offline Lisandro

  • Avast team
  • Certainly Bot
  • *
  • Posts: 67195
Thanks for the explanation Vlk.
As usual, we can trust when the company acknowledges.
A telephone number will allow Evangelists to warn.
The best things in life are free.