Smells Like Teen Systems: DevOps Nirvana

Frank Wiles, @fwiles @revsys  Slides will be online later.

Smells Like Teen Systems: Advice for raising healthy happy systems and getting to DevOps nirvana

People are fearful of change. Must be small at first. Baby steps. Be agile — little a, not big A: be spiritual, not fundamentalist; mandating….just because you read it somewhere, doesn’t mean you must do it if it doesn’t work for your organization. Have ammunition: managers need data, explanations to make decisions.

Apply metrics mentality to:

  • change requests
  • trouble tickets and bugs
  • deployments
  • outages of the smallest magnitude
  • interoffice political fights
  • approved and denied requests for equipment or funds
  • hires, fires, and quits
  • $$; labor hours, etc

“We spend on average 19 hours per week requesting more information”

Guilt tripping — no other option to keep up.

“Once we put <insert system> in place, we realized we no longer needed that weekly meeting…”

DevOps: Develop Everything Visibly Automate Paranoid Services

DEV: Develop Everything Visibly: “Everything has to happen out in the open”

OPS: Operate/Automate Paranoid Services “Automate everything with ridiculous amounts of monitoring and metrics”

Everything is version-controlled. Log of why things happened.
Everything is tracked. Ticketing; Trello; Bugs; etc.

Even more visibility:

  • Level 1: Team Chat. Like Slack. Email is for outsiders.
  • Level 2: Chat Ops <– mmmmmbot!
  • Level 3: Have some fun <– Fun bots

Chat ops suggestions

  • Deployments and config changes
  • Status summaries: bot check load db3
  • Maintenance: bot start maintenance file-server-1
  • Display Alerts and Warnings
  • Server boot/shutdown messages
  • Ops logs: bot log Upgraded redis to 2.8.19
  • Resolutions: bot resolve ticket #8 Ended up just needing to restart Apache
  • Common actions: bot restart apache on production

Tools: This is how we do it

  • Python: scripting language {relatively easy to learn and readable; libraries for talking to everything} Lots of libraries: Fabric highly rec’d, shell scripting on steroids
  • SaltStack: master & and then salt (minion) code. as simple or as complicated as you want; fast communication even among hundreds of systems (zeromq +aes); extensible via python; ability to return data to the master for monitoring or metrics purposes; simple to crazy complicated orchestration between systems. Examples of uses: Targeting (/srv/salt/top.sls); Pillars (/srv/pillar/* (config differences as data such as); templating
  • Consul: service discovery and monitoring: health checks; discover services via DNS or HTTP REST apis; deadman health checks.
  • ELK: Elastic Search/Logstash/Kibano <– fast log searching for when you don’t.
  • “Logs that aren’t centralized are rarely checked and logs that aren’t searchable are never correlated” -Frank Wiles
  • Graphana: for metrics visualization; pretty graphs.
  • Don’t capture exceptions in your inbox; put in a system. Exception.io; Rollbar. Rollbar also tracks deployments.
  • What to capture? As much you can store.
    • general collectd system stats
    • logins/signups/emails sent
    • failed login attempts/emails bounced
    • run time of crons and batch jobs
    • backup run times and file size(s)

Resistance. Route around it. If you don’t work with the process….

Maverick Ricardo Semler {1993}

Turn resistance back on others, sometimes so it’s so cumbersome that it burdens their way of thinking.

Open Source and Scale at Twitter

Keynote at 2015 Kansas Linux Fest, hosted at Lawrence Public Library

Dave Lester @davelester

OSS Advocate at Twitter, Inc.
Apache Mesos and Aurora PMC Member

A lot of metadata are in tweets.

Twitter is a big proponent of Open Source — see their website. Some are on Github; some are not.

Front-end developing — Bootstrap; typeahead.js

Dave focuses on key infrastructure projects at Twitter: finagle, scalding; analytics and infrastructure

1. How is Twitter scaling?

What is scaling? See Wikipedia entry. Reaching beyond your current capacity — social and technical solutions.

Twitter numbers (2014):

  • 500 million tweets /day
  • 3.5 billion/week;
  • 6000+ tweets/sec (steady state)

Twitter is “the pulse of the planet”. Can sometimes predict spikes (live, popular events, like the World Cup); sometimes can’t. Could throw 10x the servers at the problem OR improve scalability.

Remember the Fail Whale?

Previously, Twitter: ruby on rails, 200 engineers pushing code; needed a solution to isolate failure and isolate feature development

During 2010 World Cup — lots of issues keeping Twitter up; 2014 after scalability and OS projects, much more stable.

Breaking up monolithic applications into microservices. Common pattern among companies; see Groupon talk, “Breaking up the monolithic

Today, building a distributed system.

2. Twitter’s Open Source infrastructure

  • “Twitter Stack” including Apache Mesos, Aurora, Finagle
    • Mesos: top-level software at Apache; began as research project at UC Berkeley; layer of abstraction between machines in a datacenter and applications that run: cluster manager & resource manager. Mesos actively monitors what’s happening across the cluster (Zookeeper). Addresses the problems of fault tolerance and resource efficiency and utilization.
      • Design Challenges: each framework may have different scheduling needs; must scale to tens of thousands of nodes running hundreds of jobs with millions of tasks; must be fault-tolerant and highly available
      • Master-Worker architecture + Zookeeper cluster
      • Marathon scheduler
      • A lot of #klf15 scalability preso is going over my head, but I do wonder what #kohails project could “get” from Mesos/Aurora scalability
  • Why care about resource utilization? Fewer machines; less human resources.
    • How to best reuse idling times? Early research
    • Quasar — users specify performance target for applications instead of typical resource reservations; machine-learning used to predict resources usage and for cluster scheduling; research by Christina Delimitrou and Christos Kozyrakis at Stanford
    • Google Borg — Google’s cluster management solution; AMP Lab, and John Wilkes spoke at MesosCon 2014.
    • Aurora provides deployment and scheduling of jobs; rich DSL for defining services; health checking; one scheduler to rule them all: can manage both long-running services, as well as cron; can mark production and non-production jobs; production jobs can pre-empt non-prod jobs; has an additional priority system. Aurora has executor features — responsible for executive code on individual worker machines, sending status to Mesos when a task completes.
  • Hundreds of separate services with different owners
  • Managed by Site Reliability Engineer (SRE) teams

3. How and why OSS?

“many parts building on and amplifying each other” –Gordon Haff, Red Hat

Building an ecosystem.

Frameworks

Services: Aurora; Marathon; Kubernetes; Singularity

Big Data: Spark; Storm; Hadoop

Batch: Chronos; Jenkins

Framework bindings — C++, Java, Clojure, Haskell, Python, or write your own.

Resources for writing mesos frameworks — his slides will go online with links to this info.

Community > Code. Very very much true.

Let’s Scale in the Open: increased speed of innovation; more-reliable software; more-visible contributions and impact; broader peer group and sense of community.

Out with the Old, In with the New (KLC Closing Keynote)

Awful Library Books @awfullibbooks

slides

Holly Hiber and Mary Kelly, authors of Making a collection count : a holistic approach to library collection management

People come to libraries to get the materials that meet their needs. We need to have the right info for them, that is correct.

Continue reading

Getting CLASSy with Lifelong Learning

Morgan Davis, Salina Public Library, Community Learning Coordinator
Outreach Department. Has a background in PR and communication. Job is a great way to be infused in the community.

Library’s mission statement: “Connecting people to information, learning, and culture.”

CLASS: Community Learning and Skill Sharing

Each semester CLASS has 50-60 classes. Instructor are community members who have been found to teach the classes.

People who come to classes, never stop learning.

What is CLASS? A program of non-credit classes offered by community members at a low-cost and with low-commitment. Chose not to offer certification or for-credit courses, so the library doesn’t compete with other community organizations.

1.5 hrs 1 time, up to 6 week-classes. Most expensive class is a beginning Spanish class, for $89, 20 hours over five weeks, includes a textbook.

“programs” library programs are typically free to attend and may serve a specific purpose or present a specific point-of-view

“classes” lifelong learning classes require a course fee and are broadly educational in nature, and make sure there’s a value-added take-away from the class.

Are instructors paid? $15/class hour offered or volunteer time. This is built into the class fees. The library makes no money off this program.

From the beginning:
Grassroots: Learning for Life began with 6 people and a vision for community learning. 410 people in first semester. Learning for Life was a trademarked name, so it had to be change.

Non-profit status: In 2004, CLASS was granted non-profit status.

Move to the library: After CLASS reached out to the library board, they were allowed a trial run for the fall 2005 semester. Library director was one of the original creators of the program.

Today: our semesters average 700 enrollments and are almost entirely self-sustained.

Program website: www.salinapubliclibrary.org/class

Think through registration process, including simple approaches at the beginning.

Signup registration software. Switching to CourseStorm for credit card processing. Can add cash & check registrations on the back end.

Close registration a week before the class. Have a course-enrollment minimum set, and if that minimum isn’t met, cancel the class and let registrants know.

LERN is the bomb! http:www.lern.org International learning organization that the library belongs to.

Community support is critical for this program. If the community isn’t willing to invest time, money, interest, buy-in and more, the program won’t succeed. Financial support is only a small part of what makes CLASS successful.

Positive word of mouth keeps CLASS going. Friends sign people up for gifts, surprises. Invest in the people.

Have a liberal refund policy for people. Registration refundable if a participant cancels a week before.

How can you afford to offer classes?

How to price course:
Course fee involves: instructor fee, staff time, room use fee, materials, library supplies, (monetary value — will this class be worth the fee set). Divide the total cost to run the class divided by the course minimum, to then determine the course cost.

One course may have a surplus, but one may not. It all evens out in the end.

Program called Pass the Buck, for people to contribute toward a scholarship fund for people who can’t afford classes. Someone who asks for a scholarship will attend a class at half-cost.

Is there a dedicated space for classes? Community learning center has 2 classrooms, but also find spaces in the community to host the classes.

Offer classes in the best venue for the class. Community kitchen. Cabinetry company has kitchen as well. Schools. Churches. Main library building used. Some of these do charge for the space.

A week before the class, a reminder email sent, including map to the class location. If they don’t email, a phone call will be made.

Do people object to going outside of town for some classes? Every now and then, someone does say that, allow people to drive attendee out to the class location.

Sounds great, but who’s going to teach?
*Teachers will come to you (credentialed and passionate teachers)
*Students will come to you (“I want to learn x…” who can teach that?)

But if they don’t come to you, check these places for people and topics:
*Newspaper
*Facebook
*Community calendars
*Art center, museums, or libraries
*Schools
*Colleagues and friends

This helps you find topics and attendees and teachers, but it also helps you know which dates to avoid (especially the school calendar).

Teachers that receive negative reviews….ask the instructor how s/he thought the class went, let the instructor read the evaluations, and that opens the door for further conversation. People usually know, they’re good at self-evaluation.

Usually asking someone to teach, it’s a compliment to them, encouraging to them.

If library staff teach a class, it’s voluntary time, still have to go through the application process.

The numbers part
Statistics will differ with each library the type of information you need to track.

Good things to know in building your program:
*Attendance
*Income
*Expenses
*Marketing reach
*Types of payment being taken

Always get participant and instructor feedback via evaluations

Talk to people! If they have to pay for something, they won’t be afraid to let you know whether they thought it was worth their time and money.

Course catalogs mailed to people who have taken classes over the last 3 years. Also sent to three-targeted carrier routes. And left in common community spaces. — track where people hear about classes.

Targeted course catalogs cheaper than newsprint.

Staff attend beginning of first class, to make sure things go smoothly. LERN suggests leaving sticky note with “feel good” message for instructor.

Challenges: inter-departmental relationships. Library departments and communication. Have conversations with library staff often. Understand what other library departments do, their time, their challenges…that helps communications.

Future:
*CLASS 4 Kids! 10 classes targeted at kids + family classes
*New demographics
*Increased participant input
*Stronger online presence

If 50 percent classes go the first time on a new approach, that’s a good success rate. Try things twice, two different times of year, day, venue, etc.

@morgandavis2011

Leadership for the Common Good: Lessons from the Kansas Leadership Center

Kansas Leadership Center was set up by the Kansas Health Foundation, based in Wichita, KS.

Emporia State University is an official partner with the Kansas Leadership, and infusing parts of the KLC with the undergraduate and graduate courses, including SLIM.

KLC’s materials are not copyrighted, and free to use. Methods come out of Harvard Kennedy School of Government.

Adaptive vs technical problems Dr. Andrew Smith

Adaptive vs. technical problems change in approaches and outcomes. Technical problems can be complex; critically important; resolved through application of authoritative expertise, and in organization’s current structures, procedures and ways of doing things.

Adaptive problems are bigger than technical issues. Only can be addressed through changes in people’s priorities, beliefs, habits, and loyalties. Q: Why are we doing this? A: It’s always been this way. If you start changing things, that’s adaptive.

The practice of Adaptive Leadership by Ronald Heifetz

Technical problems are clearly defined, have a clear solution, and the locus of work is authoritative. Doesn’t mean that it’s simple and easy to do and may very well be expensive.

Technical and adaptive problems are clearly defined, the solutions require learning, and the locus of work is authoritative and stakeholders.

Adaptive problems require learning to be defined, the solution requires learning, and the locus of work is the stakeholders.

If you start thinking about problems and solutions from this perspective, it’s very helpful when you are the one who is having to come up with a solution. Getting to the stage where you can figure out what type of problem it is, can better help you figure out how to best solve the problem.

Distinguishing leadership from authority

Leadership isn’t the same as authority. As we are learning leadership principles, these are things we can take, learn, and apply, without being the boss. That’s not the same as being the boss. Authority — one person in charge, can take a problem, determine it’s technical, and say here’s how to fix it. Adaptive problems, involves more people in the decision-making and solution process.

Leadership problem: not asking the people at the ground level how to fix something — they can know the solutions, but aren’t in authority positions to make changes; those problems aren’t treated as adaptive, but technical problems. The TV show Undercover Boss reveals this again and again.

You don’t have to be the person in charge to do adaptive leadership.

There may be more going on in a situation — we may need a lot more information than what is first presenting. Doctors ask a lot, am I fixing a symptom or the illness?

Adaptive: change something, not just solve a problem. Management may need to change approaches, processes, workers may need to change how a task is done, etc.

Seldom do you have all the information you need, you need to ask for more. More people may need to be asked. Who are the stakeholders? Paying, consuming, doing, paying from a distance. Asking the right people and enough people, not just diving in and solving the technical issue.

If it truly is a technical problem, has a clear, authoritative expert solution.

Adaptive, need to look further, and making changes, talk to more people.

Competing Values/Commitments, Dr. Gwen Alexander

Lisa Leahy and Robert Kegan, Immunity to Change: How to Overcome It and Unlock the Potential in Yourself and Your Organization

What’s keeping you from achieving your goals?

A hidden dynamic in the challenge of changes: competing values/commitments

The technical fix often works for awhile, but there may be underlying causes that need further changes and not just technical fix.

You may say, I want to make a change, but parts of you contradict that need to make a change.

We’re unable to make the change we want to make because we misdiagnose it as technical, and it’s really adaptive, and requires much more thought to solve the challenge.

Adaptive solutions change yourself — changing to the situation.

Competing values/commitments cause IMMUNITY to change.

How can you bring your competing values and commitments together so they work together, not barriers to change?

There can be other stakeholders involved and politics at play with competing values and commitments. A lot in the environment could be affecting the opportunity for change.

Exercise: Write down your goal. What are your behaviors that allow progress toward the goals and preventing achieving the goal? What are the hidden competing values and commitments? Defensiveness comes out, rationalizing our behavior. It’s easier to come up with a rational defense, than to come up with the steps to make it happen and suffer the consequences/repercussions.

You cannot use technical means to solve adaptive challenges. Technical issues: the skill sets necessary to perform those complicated behaviors are known. Adaptive issues require you to develop a more sophisticated approach.

If you have worries, you may have competing commitments that are preventing you from achieving your goals. Do you have competing commitments or do they have you? When you have competing commitments, you’re driving with one foot on the accelerator and one foot on the brake.

Observe your thoughts, emotions, and behaviors, and learn to use this information. Bring your new capacity for adaptive changes to other issues in your work and personal lives.

Care of Self, Dr. Robin Kurz

Competing priorities/commitments: We tend to put everything above ourselves.

As adaptive leaders, we must know our strengths, vulnerabilities, and triggers; know the stories others tell about us (self-image vs. reality); choose among competing values. Most of what we do isn’t immediately life-threatening/critical. Take a sick day when getting sick.

As adaptive leaders, we must get used to uncertainty and conflict (organizational – especially around change; internal; external); experiment beyond our comfort zones; take care of ourselves as individuals (and not put our expectations on others).

Sometimes we misinterpret a situation.

Too close to a situation, need help understanding other viewpoints, shift perspectives, adapt to a situation.

Conclusion

Leadership on Demand from the Kansas Leadership Center is a 10-week video series to help you make progress on an issue you care most about in your community or organization. For $50 you will have access to KLC online curriculum and a Leadership on Demand workbook. Watch the first video in the Leadership on Demand series free.