Notes from DevOpsDays 2019

My notes from DevOpsDays 2019 held at Mile High Station in Denver, Colorado from April 29th to April 30th, 2019. It was also great to volunteer at this wonderful conference for technologists. A special thanks to the organizers for giving this great opportunity. hashtagdevopsdays2019 hashtagdevops hashtagdevopsdays


DevOpsDays 2019 – people, process & technology

·      Company ReactiveOps – Junior engineer, Kim Schlesinger, gave talk on Zero to SRE
o    Talk on focused mentoring and growing of junior engineers in a team filled with senior engineers – also 20% Learning Time critical – bake it into Kanban board and part of daily Scrum status
o   Engineering Leveling document important to share
o   Share apprentice learning plan – An example is here
·      Critical role of SRE (Site Reliability Engineer) in DevOps; SLA’s and SLO’s important for SRE
·      Datadog – monitoring of applications
·      Rundeck – For Self Service Ops
·      Runbooks when something breaks

·      Talk on Product Management and DevOps by James Heimbuck
o   Qualification and quantification of problem essential
o   Learn build measure
o   Customer conversations – open ended questions, cross check problems across team, note taker or record conversations
o   Good book “Story Maps” by Jeff Patton
o   Important to group like stories together; important to draw a line for  MVP
o   Assessment of MVP Stories
§  Level of Effort vs Impact
§  RICE score – Reach x impact x confidence/Effort
§  Force Rank
·      Emerging field of DataOps
·      Serverless vs Containers
·      Automated Canary Releases
·      Research done by Rayner & Keashly – 2005, Sutton – 2010
·      Services meshes
·      Ignite Talk: My best sources of learning
o   BrightTalk
o   EDX.org
o   Udemy
o   Coursera
o   Udacity
o   Qwik Labs
o   Eduonix
o   DataCamp

·      So you work with an asshole by Kaylee Koch
o   Someone who makes every conversation about themselves and their work
o   Contagious – creeps into the culture
o   The Ripple Effect: Emotional Contagion and its influence on group behavior by Sigal G. Barsade – Yale university
o   Temporary vs Certified
o   Certified – “Needs to display a persistent pattern… history of episodes…”
§  Causing targets to feel “belittled, put down, humiliated, disrespected, oppressed, de-energized and generally worse about themselves…”
§  Personal insults
§  Threats
§  Intimidation
§  Sarcastic jokes
§  Teasing
§  Bullying
§  Flaming someone in communication channels
§  Public shaming
§  Rude interruptions
o   Psychological
§  Depression
§  Anxiety
§  PTSD
§  Increased turnover intentions
§  Increased intent to quit
§  Counter productive work behaviors
§  Turning into assholdes themselves
§  Stifle creativity
o   Physiological
§  Poor Quality sleep
§  Fatigue
§  Cardiovascular disease
§  Fibromyalgia
o   Equation - (9/80)xy   - x= Total number of employees y = Replacement Cost
o   What to do
§  Da Vinci Method
§  Avoid the person at all costs
·      Standing meetings
·      Teleconferencing vs meeting in person
§  Reframing
§  Decompress
§  Leave the organization
§  Implement the no asshole rule – e.g. Southwest Airlines
§  Use assholes as an example of poor behavior that your company will not tolerate
§  Ensure your HR department is taking claims of poor behavior seriously
·      Mediation
·      Coaching
·      Anger management
o   The Asshole Survival Guide and The No asshole Rule books by Robert L Sutton

·      Open Source Developers are Security’s new front line by Curtis Yanko
o   Automation accelerates OSS downloads
o   Network attacks -> My code attacks -> Open Source attacks -> Ecosystem attacks
o   2013 CVE 2013-2251
o   Widespread compromise post disclosure
o   Attacking kit
o   2014 – heartbleed and shellshock (including logos!!)
o   2015 commons collection CWE-502
§  Ransomware attack at Hollywood Presbyterian Hospital – shutdown the hospital
§  23 mil downloads in 2016, 78% downloads were vulnerable
o   Equifax breach – Time to respond before exploit
o   The economics of cybercrime –
§  In 2016, cybercrime $450 Billion vs illicit drug trade, $435 Billion
§  2018 – cybercrime $1.5 Trillion
o   Crypto currency
§  Your server has cpu cycle
§  Your visitors have cpu cycles
§  Your build infra has CPU cycles
§  Crypto currency allows the attack to be directly monetized
o   Jenkins under attack – Jenkins miner: One of the biggest mining operations ever discovered - $3.4 million has been mined
o   AngularJS
o   The new front line – OSS developers and Package Maintainers
o   Malicious npm packages “typosquatted” – shipping info to russia
o   10 malicious python packages – shipping info to china
o   Golang go-bindata github id deleted and reclaimed
o   Conventional – changelog compromised and turned into a Monero miner
o   Backdoor discovered in npm get-cookies module published since March
o   Unauthorized publishing of mailparser
o   ssh-decorator python module stealing private ssh keys – shipping info to eastern Europe
o   Gentoo Linux repository compromised
o   Malicious eslint discovered to be stealing npm credentials
o   Homebrew repository compromised
o   Npm event-stream attack on CoPay
o   Ruby Gems bootstrap-saas RCE backdoor
o   Social engineering the way into an open source project and taking over
o   Consumers need to continuously evaluate/monitor changes

·      No More Flaky Tests: Building Trusts in your automated tests by Jyoti Mittal
o   Speed without quality kills
o   A recent survey of Devops practitioners found that 63% of delays were occurring in the testing cycle
o   Succeeding with Agile by Mike Cohn – Test automation pyramid
o   Technology Stack
§  Use docker containers on gitlab CI and Jenkins
§  Performance – sitespeed.io, new relic test monitors
§  netSparker security Testing
§  Gridlastic Test Execution platform – selenium in the cloud
o   Test flakiness is a big problem
o   Treat Flaky tests as your friend
o   Flaky tests should not be tolerated
o   Google has around 4.2 million tests that run on continuous integration system. Of these around 2% are flaky tests- Google has a blog that solely talks about how they are fixing flaky tests
o   Quarantine the flaky Tests
o   Don’t abandon the flaky tests
o   Katalon Analytics – Groups flaky tests over time – free tool
o   ReportPortal.io – open source analytics dashboard using machine learning to analyze test results
o   Use Analytics for your test reporting
o   Consider flaky tests as a friend
o   Understand the nature of flakiness
§  Asynchronous behavior
§  Concurrency/race conditions
§  Third party web service calls
§  Test order dependency
o   Don’t use sleeps, use intelligent waits
o   Thread-safe the test code
o   Tests should be atomic
o   Avoid shared resources
o   Use Mock servers
o   Love your Test code
§  SOLID software design principles, DRY principle
§  Design patterns like page object model, factory pattern
§  Reusable packages
§  Test code quality should be same as application code quality and standards
o   Machine learning based self-healing algorithms for testing – mabl, testim, retest, functionize
o   Quality is a Team sport
o   Selenium testing blog and Google testing blog

·      When DevOps go Awry: A Retrospective
o   Political Coin (a.k.a. political capital)
o   Privilege
o   Awry
o   Agency
o   Confession
o   Ivory Tower
o   Are you in the tower?
o   Recovering from it
o   Focus on empowering those that are not in the tower
o   Intentions
§  Best practices standardization
o   Cargo Culting – Following principles != Success
o   Unintentional Edicts
o   We will setup a new deployment pipeline -> it goes live Thursday
o   Devops requires collaboration and trust
§  Edicts must be discussed and potentially challenges
o   The harm of unbalanced teams
o   What to do?
§  On call runbooks for undesirable tasks
§  Collaborative sessions
o   Firehose of unsolicited opinions
o   Imply a lack of trust towards recipients
o   Imposter syndrome
o   Five Whys: Power Dynamics, Trust, Communication

·      How I learned to stop worrying and love remote employees by Kristina Vincent
o   Works at Gannett – USA Today and 130+ news websites
o   18 Engineers, 11 states, 4 US time zones, 14 distributed
o   Spend 90 minutes of drive time on actually working
o   Increased productivity when you can control your achievement
§  Commute
§  Environment
§  Stress/happiness
o   Trust remote employees
o   Larger talent pool to recruit from
§  Candidates with diverse backgrounds and ideas
o   After hour incident response is much faster
o   No need to incentivize relocation
o   Problems & Solutions
§  Time zone differences – define core availability hours
§  Lack of face time – use specific communication tool like slack, video conferencing – Regular travel brings us together
§  Accountability/Distractions – screen candidates from comfort level with remote work, be camera in group meeting (on mute) up always during work hours, all meetings held in remote formats, all ranks of the org can be remote
§  Remote workers don’t have access to onsite IT – HQ IT staff are trained to help remote employees be successful




Comments

Popular posts from this blog

Leadership Principles - Beyond the Job Description: Cultivating a Culture of Proactive Problem-Solving

Autumn's Artistry

Book Review: Into the Wild by Jon Krakauer