User:Sardisson/Crash Analysis Workflow

From Camino Wiki
Jump to navigation Jump to search

bsmedberg asked ss and I to write up our thoughts on how we use (or would like to use) crash analysis reporting and make them available for the denizens of and the socorro server engineering team.

Below is where I am working through my thoughts.

Development Branches

A. On a weekly basis, I want to look at the accumulated crash stats for that branch and

  • Quickly see where the top crashes are occurring (crash signatures)
  • See which crashes are "increasing"
  • From the above screen, access crash stacks for each crash signature
    • In some cases, sort the resulting crash stacks by OS/build ID
    • In some cases, weed out known crashes caused by older versions of plug-ins or haxies
      • Often these stacks have a very generic frame 0 (e.g., libobjc.dylib) and a frame 1 that's unique to the crash; in a perfect world, it would be great to be able to re-list and examine the crashes with the generic frame 0 that don't have frame 1 that matches the known crash (i.e., when looking for patterns among generic frame 0s, don't have to ever [waste time] looking at the ones already matched to known crashers).
      • In a perfect world, have feedback people email these people telling them their crash is caused by an old version of a plug-in or by a haxie
  • Possibly "compare" crash signatures to trunk and releases to see if they're occurring there, too
  • Cross-reference with any existing bugs (or recently closed bugs)

If there are no existing bugs on a major-seeming crash, attempt to reproduce from URLs and comments and file bug(s).

B. If I file a bug based on crash analysis, particularly one with a large number of incidents but which I was unable to reproduce, I want to

  • Continue to monitor the crash signature for new incidents which may provide URLs or comments that allow the bug to become reproducible
    • Develop a quick query for this, with an open-ended end date
  • Search across products for incidents if there's reason to believe it might be in shared code

C. If a bug filed based on crash analysis, or having a crash analysis component, has been marked FIXED, I want to

  • Verify that the crash signature disappears (or, in the case of overly-generic crash signatures, declines significantly) in builds following the checkin
  • Have easy access to incidents of the crash signature from *only* builds after the check-in of the fix, in order to check whether the incidents appear to be the same crash or another crash whose top frame happens to be the same signature


For releases, steps are broadly similar but over different timeframes (more focused during the Release Candidate stage and for the first while after a release).