2018-06-02
    
Part 1 - The Three Ways
  - The foundation of DevOps can be traced back to Lean, The Theory of Constraints, and Toyota Kata. It has its roots in manufacturing process management
 
  - DevOps is an extension of the Agile movement paired with the Continuous Delivery movement
 
  - Measured in lead time and percent complete & accurate (%C/A)
 
The First Way - The Principles of Flow
  - We must make work visible
    
      - Unlike physical processes, technology work is largely invisible. It is hard to see where flow is impeded or where work is piling up
 
      - We should use tools to visualize how our work flows from left to right, like a kanban board
 
    
   
  - Limit work in progress (WIP)
    
      - Daily work is dominated by priority du jour, or urgent work
 
      - Disruptions are highly visible in physical processes but almost invisible for tech workers
        
          - An engineer will context switch and re-establish congnitive rules and goals the result in slower and more error-prone work
 
        
       
    
   
  - Reduce batch sizes
    
      - Reduce changeover cost between tasks, so that team does not feel forced to complete in specific operations
 
      - Allows work defects to be discovered early, so that problems can be fixed before other items flow through
 
      - Large batch releases cause sudden large amounts of WIP and less parallelization
 
    
   
  - Reduce the number of handoffs
    
      - Each handoffs incurs loss of knowledge
 
      - Each handoff is a potential queue where work will wait
 
      - Each handoff requires various sorts of communication, signaling, prioritization, testing, scheduling, etc.
 
    
   
  - Continually Identify and Elevate Our Constraints
    
      - Every system has a constraint. Any optimization that is not on the main constraint is an illusion
        
          - Work with either queue up it optimization happens before the constraint, or optimization will be starved if it comes after the main constraint
 
          - Constraint should be optimized to be the product owner or development, not operations, testing, deployment, etc.
 
        
       
    
   
  - Eliminate hardships and waste in the value stream
    
      - Waste is anything that causes delay for the customer that could be bypassed without affecting the result
 
      - Waste includes
        
          - Partially done work that is blocked
 
          - Extra processes
 
          - Extra features outside of requirements
 
          - Task switching
 
          - Motion of communication
 
          - Defects
 
          - Nonstandard or manual work
 
          - Heroics
 
        
       
    
   
The Second Way - The Principles of Feedback
  - Working Safely within Complex Systems
    
      - Complex systems are defined as systems which defies a single persons ability to see the system as a whole and understand how all the pieces fit together
 
      - Failure is inherent and inevitable
 
      - We must aim to work without fear because we are confident errors will be detected quickly before catastrophe occurs
 
      - To be safe within a complex system, we must meet the following conditions
        
          - Complex work is managed so that problems in design and operations are revealed
 
          - Problems are swarmed and solved, resulting in quick construction of new knowledge
 
          - New local knowledge is exploited globally throughout the organization
 
          - Leaders grow other leaders who continually grow these types of capabilities
 
        
       
    
   
  - See problems as they occur
    
      - We must tighten the feedback loops on the quality of our work within the system
 
      - When feedback is delayed and infrequent, it is too slow to prevent undesirable outcomes
 
      - Automated builds and testing allows us to identify when a change is introduced to the system that’s incompatible with expectations
 
      - Detects issues early on, but also identifies how these can be prevented in the future
 
      - Feedback allows us to steer
 
    
   
  - Swarm and solve problems to build new knowledge
    
      - Swarm problems to contain problems before they can spread and to diagnose and treat the problem so it doesn’t happen again
 
      - Andon cord, used in Toyota plants where every worker is trained to pull the cord when something goes wrong
        
          - This could mean a defective part, a required part is not available, or even that work is taking too long
 
        
       
      - When things are wrong or slow, the entire production line is stopped so that the problem can be fixed
        
          - This prevents the problem from continuing downstream
 
          - It prevents work centers from starting new work that will likely introduce more issues into the system
 
          - If problem is not addressed, work center could potentially deal with the same problem and cause more work loss
 
        
       
      - Swarming seems contrary to common management practices, but it
        
          - Prevents loss of critical information due to fading memories or changing circumstances
 
          - Provides fast feedback into the system
 
          - Isolates the problem
 
          - Prevents further complicating factors
 
        
       
    
   
  - Keep pushing quality closer to the source
    
      - More inspection steps and approval processes introduce potential for more errors, since the distance between who does the work and the decision makers is larger
 
      - Ineffective quality controls involve manual processes, approvals from busy people, and large documentation
 
      - Peer reviews should be implemented
 
      - Automatic tests and other checks should be implemented and required before changes are checked into production
 
      - Quality is everyone’s responsibility
        
          - Developers are usually the furtherest from the customer
 
          - Developers can’t learn when they’re punished for mistakes from months ago
 
        
       
    
   
  - Enable optimizing for downstream work centers
    
      - Lean defines two customers - internal and external
 
      - Our most important customer is the next step downstream
 
      - Operational non-functional requirements are prioritized as highly as user features
 
      - This creates quality at the source
 
      - Examples from manufacturing include asymmetrical materials so they could not be assembled backwards or screw fasteners that were impossible to over tighten
 
    
   
The Third Way - The Principles of Continual Learning and Experimentation
  - Enabling organization learning and a safety culture
    
      - Never name, blame and shame the person who caused a problem. We are human and mistakes happen.
 
      - Our work is almost always performed within a complex system
        
          - How management chooses to react to failures and accidents may lead to a culture of fear which then makes it unlikely that problems and failure signals are ever reported
 
        
       
      - Conduct blameless post-mortem after every incident to
        
          - gain the best understanding of how the incident occurred
 
          - agree on countermeasures to improve the system
 
        
       
    
   
  - Institutionalize the improvement of daily work
    
      - In the absence of improvements processes don’t stay the same - due to chaos and entropy, processes actually degrade over time
 
      - We improve daily work by explicitly reserving time to pay down technical debt, fix defects, and refactor and improve problematic areas of code
 
      - We schedule kaizen blitzes, which are periods when engineers self-organize into teams to work on fixing any problem they want
 
    
   
  - Transform local discoveries into global improvements
    
      - When new learnings are discovered locally, there must be a mechanism to enable the rest of the organization to benefit
 
      - i.e., post-mortem being searchable, source code repos being shared, etc.
 
    
   
  - Inject resilience patterns into daily work
    
      - Introduce tension into system to elevate performance
 
      - Seek to reduce deployment times
 
      - Reduce test execution times
 
      - Perform game day exercises, rehearsing large-scale failures or Chaos Money like Netflix
 
    
   
  - Leaders reinforce a learning culture
    
      - Leaders must elevate the value of learning and disciplined problem-solving
 
      - Coaching kata
        
          - the scientific method of stating True North goals
 
          - Organization goals to individual, team-based, measurable goals
 
        
       
      - Conduct experiments, with the leader coaching the person running the experiment to continue iterating and learning
 
    
   
Part 2 - Where to Start
  - How do we practically implement a culture of DevOps into our organization?
 
  - How do we decide where to start?
 
  - How do we enable our teams to succeed?
 
Selecting Which Value Stream to Start With
  - Single product team rather than functional teams
    
      - Reduces handoffs
 
      - aligns goals
 
      - Remove external team dependencies
 
    
   
  - Increasing team size not always best move
    
      - Improve the way work is done. Increase effectiveness
 
    
   
  - Greenfield vs. brownfield
    
      - Greenfield are new projects, where culture can be built in from the start
 
      - Brownfield projects may be more receptive because it’s clear current process is not working
        
          - DevOps has been used to successfully transform brownfield projects
 
        
       
    
   
  - Start with sympathetic and innovative groups
    
      - Much like Crossing the Chasm, look for early adopters
 
      - Don’t spend time trying to convert conservatives groups. They must see proven track record
 
    
   
  - Build critical mass and silent majority
    
      - Expand to more teams and value streams
 
      - Do not have to be most visible or influential groups, but expand the coalition
 
    
   
  - Identify the holdouts
    
      - Must have enough success to protect the initiative
 
    
   
  - Little fish learn to be big fish in little ponds
 
Understanding the Work in Our Value Stream, Making it Visible, and Expanding Across the Organization
  - Value stream mapping
    
      - Conduct a workshop with all the major stakeholders
 
      - First create high-level process blocks
 
      - Focus on places where
        
          - work must wait for weeks or months
 
          - waiting for processes
 
          - significant rework is generated or received
 
        
       
      - Measure each block in %C/A, lead time, and value add time
 
      - Identify metrics that need to be improved
 
      - Unexpected insights
 
      - See obvious areas of improvement
 
    
   
  - Identify teams supporting our value stream
    
      - No one person knows all the work that must be performed to create value for the customer
 
    
   
  - Initiatives like DevOps transform are inevitably in conflict with ongoing business operations
    
      - We are trying to improve business operations, but ultimately require disruptions to change how we work
 
      - Business are built to be resilient to change
        
          - Good for maintaining status quo, but this puts us at odds with groups who are responsible for daily operations
 
        
       
    
   
  - Organizations must create a dedicated transformation team
    
      - Must be able to operate outside the rest of the organization that is responsible for daily operations
 
      - Allows “performance engine” to continue to operate the business
 
    
   
  - Dedicated team is accountable for achieving a clearly defined, measurable, system-level result
    
      - Separate from team as to not interrupt normal operations
 
      - Create a separate space to maximize communication flow within the team
 
      - Select team members who have long-standing and mutually respectful relationships with the rest of the organization
 
    
   
  - Agree on a shared goal
    
      - It should require considerable work but is not impossible
 
    
   
  - Limit the number of these types of initiatives as to not tax the organizational change management capacity
 
  - Keep improvement planning horizons short
    
      - Allows flexibility to reprioritize
 
      - Quicker realization of improvements that make meaningful differences
 
      - Less risk that project is killed before demonstrable outcomes
 
      - Early wins are important
 
    
   
  - Reserve 20% of cycle for non-functional requirements and reducing technical debt
    
      - Organizations that need process improvements the most are those that have the least amount of time to spend
 
      - Organizations that do not pay down technical debt will soon be burdened with daily workarounds where no new work can be completed
 
      - If “tax” is not paid, technical debt will become large burden
 
    
   
  - Identify technical debt early and prioritize it in backlog
    
      - Ongoing incidents should halt further work
 
    
   
How to Design Our Organization and Architecture with Conway’s Law in Mind
  - Conway’s Law is inevitable
    
      - organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations
 
      - How we organize our teams has a powerful effect on the software we produce
 
      - If inevitable, we must use it to our advantage
 
    
   
  - Eliminate dependencies on other teams
 
  - Organization archetypes
    
      - Functional-oriented
        
          - Specialties are grouped together
 
        
       
      - Matrix-oriented
        
          - Combine functional and market. Usually causes confusing and complicated organizational structures
 
        
       
      - Market-oriented
        
          - Optimized for responding quickly to customer needs
 
          - Cross-functional
 
          - Potential for redundancies across organizations
 
        
       
    
   
  - Overly functional team issues
    
      - Long lead times
 
      - Work requires opening up tickets with multiple groups
 
      - Implementer often does not have context about why change is being implemented
 
    
   
  - Market-oriented teams
    
      - Don’t do top-down reorganization, as it’s very scary and disruptive
 
      - Implant engineers on existing service teams
 
    
   
  - Very important is how people act and react, not necessarily just the team organization
 
  - Align incentives to spur change or resilience
    
      - Developers should be on call
 
      - Implementers work on the front-lines to gain understanding
 
    
   
  - Encourage learning
    
      - Team must overcome learning anxiety
 
      - Hiring must see potential in skill set
 
    
   
  - Design team boundaries with Conway’s law in mind
 
  - Development should result in loosely coupled services with bounded contexts
    
      - Service-oriented architecture
 
    
   
  - Align teams with their products in a way that reduces handoffs, external communication, and cross-team dependencies
 
How to Get Great Outcomes by Integrating Operations into the Daily Work of Development
  - If operation resources are limited, use the Ops Liaison model
    
      - Dedicated release engineer for each time who becomes intimately familiar with the needs and executes the work
 
      - Business relationship manager who helps their product teams navigate the Operations landscape, prioritizes work, and streamlines requests
 
    
   
  - Create shared services to increase developer productivity
    
      - “Without self-service Operations platforms, the cloud is just Expensive Hosting 2.0”
 
      - Customers are not external customers but internal Dev teams
 
      - Includes pre-blessed security libraries, deployment pipeline, and tools
 
    
   
  - Embed Ops Engineers into Service Teams
    
      - Priorities are driven entirely by the goals of the product teams they’re embedded in
 
      - Efficient way to cross-train operations knowledge and expertise
 
      - Transform operations knowledge into automated code
 
    
   
  - Integrate Ops into Dev Rituals, and invite Ops to Dev stand-ups
 
  - Make ops work visible on shared Kanban boards
    
      - Only work that is relevant to product delivery
 
      - People may not be aware of necessary Operations work until it becomes an urgent crisis