Systems Navigator for Configuration Management

Christian Pearce

Perfect Order, Inc.

Quick Background

Overview

I was told not to make this presentation a pure product demonstration. So I decided to breakdown what I have learned about Configuration Management and Automation down to the basic parts as I have come to learn them. I am going to briefly cover what is happening in the Configuration Management community. Then I will discuss all the core parts that make up SysNav from an abstract perspective. Ideally you could turn this around and have an understanding of what is needed to build a Configuration Management system. Lastly I will give examples of how we do it with SysNav.

Points:
  • Not a product demonstration
  • A breakdown of a CM system as I learned it
  • What is taking place in the community
  • Finish with some examples of SysNav
  • The slides will be published online with all my notes

State of CM

  • The field of System Configuration is preferred
  • lssconf (Theory)
  • Configuration Workshops (read the proceedings)
  • config.sage.org (Practice)
Configuration Management itself is an ambiguous phrase. It has come to mean a few different things depending on the context. And further what it means to System Administrators is still up for debate. Currently "System Configuration" seems to be how people in the field are referring to the management of configurations on servers and networks nodes through some form of automation. For a more theoretical definition or discussion you need hop on the lssconf mailing list or read the archives [1]. For a more practical understanding config.sage.org [2] has a good FAQ [3]. There are other communities and efforts that exist to solve this problem, but I tend to focus on these two groups.

Points:
  • Configuration Management means different things
  • The field of System Configuration is the preferred nomenclature
  • lssconf is the theory community
  • config.sage.org is the practice community
  • There are others but I am not involved

Review of CM models

  • Alva Couch presented "Why people don't adopt configuration management tools"
  • "no compelling cost model."
  • Found models were defined
    • Ad-hoc
    • Incremental
    • proscriptive
    • enterprise
You might have trouble justifying the adoption of higher levels of configuration management through the use of tool that preformed automation. Alva Couch has put together a wonderful presentation that dives into the details why. One big reason people don't adopt configuration management tools is because there isn't "no compelling cost model." [2] Alva describes the four models of configuration management. Each of these models have specific meanings.

Points:
  • Having trouble justifying a Configuration Management system?
  • Examine Alva Couch's presentation on "Why people don't ..."
  • People don't adopt because there is not a compelling cost model
  • Cover the four models

Review of CM models (cont.)

  • Choose a model based cost v. time and scale.
    • Plot of the methods
    • Defines the barrier to enter the next model
    • Defines the maturity levels for each asymptote
Essentially Alva has plotted the evolution of configuration management. This plot provides us with a series of views.

Review of CM models (cont.)

This graph shows how costs increase with each model over time. What this means is depending on your situation it might be cost effective to do Ad-hoc

Points:
  • The four models graphed by cost over time and scale

Review of CM models (cont.)

Next he discusses how there are barriers moving to the next model. Going from Ad-hoc to incremental is what I have to deal with when selling SysNav. This is the same barrier many senior level administrators are faced with when attempting to sell management. Essentially the the staff needs to be retrained in order to achieved maturity for the given model. It is important to note this barriers are inclusive. So you can not skip retraining but going from Ad-hoc to enterprise.

Points:
  • Moving to each model poses a barrier
  • Moving from Ad-hoc to incremental requires retraining, this is always the hardest hurdle to overcome
  • Barriers are inclusive

Review of CM models (cont.)

The last thing he discusses with the graph is the maturity levels evident in the asymptotes. With these levels we have a new definition for organizational maturity. Ad-hoc is still anything goes. Documentability means you have to know the incremental changes you make. Reproducibility means you should be able to drop kick a new server into your environment and have it pick up where the one that died left off. Finally the interchangeability maturity means you can replace staff with no issues. It is important to point out that incremental and proscriptive models change be accomplished with the same set of tools that exist today. The difference is merely a practice and discipline of use.

Points:
  • Discuss the maturity levels (new definition for organizational maturity)
  • The difference between incremental and proscriptive is practice and disciple of use

Review of CM models (cont.)

  • Graphs the integral of each Model for Lifecycle cost
    • The cross over is the break even point.
    • Attempts to build a cost model
    • We can do this with approximations
Graphing the integrals model gives us an understanding of the cost. If we look at where each graph crosses over this is consider the break-even point for when it becomes cost effective to move to the next model. From here Alva attempts to define the cost model through a series of refinements to Patterson's "cost of downtime" formula. He does this by looking at the discipline of Software Engineering. Figure out where the dominant costs exists and use them to compute project estimates for each model. The moral of the presentation is we don't need to be perfect with our numbers. Put together an approximation and the numbers should speak for themselves.

Points:
  • Integral of each model is the total cost
  • The crossings are the break-even points
  • Use Patterson's "cost of downtime" formula, downtime never saved you money
  • Use software engineering to find the dominant costs (maintenance is a biggie)
  • Use approximations for each model to decide

Review of CM models (cont.)

CM Architecture

  • Modeled after experience with SysNav,
    (i.e. this isn't theory, this is what we do)
  • Best of Breed with current tools
  • What does not exist we build
The next part of my presentation will dive into the Configuration Management system as it exists in with SysNav. Essentially SysNav is a CM systems based on the best of breeds tools available, plus our own proprietary codebase. We marry this with models for a successful deployments in any organization. I am going to break up the systems into each of the key areas and discuss the purpose and how we do it.

Points:
  • SysNav use the best tools available, and there are a lot to choose from
  • Models help manage the complexity
  • A breakdown of the parts in the SysNav architecture

CM Architecture (cont.)

  • Bootstrapping
  • Interface
  • CMDB (ITIL buzzword)
  • Grouping
  • Components
  • Proxy Nodes (think scale)
  • Autonomous Agent
  • A model for automating
Here is a set of building blocks that I have identified that make up my Configuration Management system. It is not exhaustive but they all tie together to provide a necessary path for implementing the model that transitions us from Ad-hoc to an incremental or proscriptive solution.

Points:
  • Define the list of parts we will cover
  • Not exhaustive but together provides for incremental and proscriptive solutions

CM Architecture - Bootstrapping

  • We never start in a green field (unless you are lucky)
  • Minimal steps to get a server into your CM system
    • Two Methods:
      • Phase in with technology refreshes
      • Implement site wide by hand or with tools
  • SysNav Strategy:
    • Add the server through the SysNav interface
    • Automate the process for building a "shar"
    • Query the user for a login with superuser privileges
    • Execute a shell to get Cfengine + Package dependencies installed, with minimal set of configs to have the target beginning the process of being managed.
    • Bootstrapping is idempotent
Every organization is faced with trying to add new software to a growing set of servers. Let alone manage what they got. Configuration Management should help you quickly adopt new software that is released. This is the impetus for getting a CM system in place. So you are faced with two choices when building a new CM system. The first method is to phase in your CM system slowly with technology refreshes. This second method is to implement it site wide by hand or with the assistance of CM system. The phase in approach is slower but easier to implement. It only requires that you put the bootstrapping of your tool into your server builds. But you lose the benefit of having it on all your systems. So the value in incremental. It could take you a year or more to get all your systems migrated. The next method should happen quickly, but requires more scripting to get the server online. This could cost more in development time or implementation time, depending on whether or not you try to implement it. Give careful consideration to your needs. Some businesses adopt and acquire existing servers to be manage on a routine schedule. An example would be an IT outsourcer. They are faced with bringing new machines on a regular basis. It would be cost effective for them to have an automated method for put a server into their CM system. While on the other hand a business who's main focus isn't server management might be content to phase in an implementation of their CM system.
Since I am in the business of providing a CM tool for a range of companies we provide for all methods. A scripted solution for automating the bootstrapping of an existing systems exists through the SysNav interface. This same set of scripts is easily added to a server build post installation routine. This solution is essential a set of scripts that are build into a shar with the necessary binaries for the target server. It is possible to change the behavior of this script depending on the needs of the site. As an example we need to add some patches to some Solaris 8 servers in order to get the blastwave.org packages installed. We automated this with our script rather than having to go to the server and patch it via traditional means. This is what I refer to as an integration issue. When attempting to automate the bootstrapping be sure to keep in mind that you will face irregularities amongst your existing install base. Your bootstrapping code needs to be capable for handling these issues. When it happens update your bootstrapping scripts, document it and move on.

Points:
  • Two methods for implementing into existing environments
  • Phase in (which is essentially forcing green field)
    • Slower, but easier to implement
    • Less OS version to contend with
    • As a rule newer OS version are easier to support
    • Lose benefit of full control with your CM system until the phase in is complete
    • Value is incremental
  • Site wide implementation (which means do it all at one shot)
    • Faster, but tougher to implement
    • Potentially a larger flavor of OSes to support
    • Costs more in development or implementation depending on approach (by hand vs. automate)
    • Scripts provide bootstrapping for new server builds
    • Value is immediate
  • Evaluate the needs of your company to make the decision
    • IT outsourcers that bring in new machines ever month would benefit from a automated site wide method
    • Smaller shops that do frequent technology refreshes would benefit from a phase in approach
  • SysNav provides all methods due to the range of companies we support
  • We build a shar from a set of scripts and packages for a target platform
  • Make the bootstrapping system extensible to handle integration issue
  • We had to deploy patches automatically to a set of Solaris 8 servers before we could install packages

CM Architecture - Interface

  • Consolidated interface for management
  • SysNav Strategy:
    • A portal infrastructure
    • LDAP based users and groups (provide enterprise services)
    • Framework for Applications/Pidgets/Configurations screens
    • Roles based ACLs
A consolidated interface is important for clarity of usage. Nothing is worse then having several tools implemented on a variety of different admin servers in a non uniform way. SysNav attempts to solve this problem by implement everything inside a portal. Web based tools obviously fit this model the best. But there are usually methods for configuring and displaying information for a component that does not have anything web based.
Our portal infrastructure uses LDAP for it's users and groups. This is convenient when implement LDAP authentication on servers. The portal also provides a consistent interface for incorporating applications, pidgets and forms for gathering configurations from administrators.

Points:
  • Interface provides consistency
  • Foundation for a single integration
  • Putting everything in on one machine is probably enough (use a wiki for links)
  • This is usually neglected by people building their own system

CM Architecture - CMDB

  • Configuration Management Database
  • Store all your configs in one place
  • SysNav Strategy:
    • Object Relational DB Schema
    • APIs implement factory pattern for persistence
    • A template system to generate configs (Middle Layer)
The CMDB is a convenient place to store all the configs that get repeated with every component that is install across your enterprise. If implemented correctly you can leverage the information about each host over and over again. You CMDB could also contain organizational information about your hosts. An example would be defining groups of servers by OS, services or department. The final step is to add a template system to query the database and generate configs. Don't forget to consider the context you are implementing this CMDB in. If you plan on an incremental solution you might not need to store as much information in the database.
SysNav has an object relational database, with a rich set of APIs that implement the OO factory pattern. There is a core SysNav schema that provides for all the objects we manage in our system. This includes, clients that can be a group or host. A host inherits a node which inherits a client. Components, schedules, Logical Addresses, OSes and ACLs. It also provides for the ability to add Configuration APIs for any component a SA chooses to integrate. Finally all this is available in a template system called the Middle Layer. The Middle Layer take templates and turns them in to configurations files that are available for Cfengine.

Points:
  • Centralized store provides leverage
  • Build in only what you need depending on your model
  • SysNav provides object relational schema
  • Schema is rich and extensible with APIs
  • Middle Layer provides a method for generating configurations

CM Architecture - Grouping

  • Reduces burden of single host management
  • Provides scalability
  • Introduce complexity
  • Exist in a variety of flavors:
    • Organizational
    • Service
    • Type
  • SysNav Strategy:
    • Group of Groups
    • Conflict Resolution
    • ACLs
Groups provide a lot of power in practice, but also offer an increased level of complexity. It is a necessary evil in order to scale to a large Configuration Management system. Without it you are stuck deal with each system with independent configurations. This would leave you with an Ad-hoc Configuration Management tool. If groups implement configurations and configurations are components, then it one should implement components on groups the same way they implement components on a single host. The complexity starts to arise which you give thought to how a component should work if it is both configured for a group and a host. Add groups of groups to the equations and now you really have to come up with a good set of constraints in order to avoid ambiguous configurations or conflicts that cannot be resolved. These constraints are discussed further on the components slide.
Groups come in a variety of forms. Combine this with a good strategy to achieve the highest value possible. Groups your servers based on organizational units, services and type. Examples consist of DBA's, web and Solaris 9.
SysNav supports groups, and groups of groups. It provides the necessary conflict resolution to implement components safely. It also provides ACLs to secure large groups of servers.

Points:
  • Groups are required for scale
  • Don't implement them unless required
  • Think about how constraints could cause conflicts
  • Groups provide value based on strategy
  • SysNav provides very flexible groups giving considerations to constraints

CM architecture - Components

  • Any configurable entity that needs management
  • Four different types:
    • Server only (Nessus)
    • Client only (Logwatch)
    • Client/Server (Big Brother)
    • Interdependent (Complex relationships, clusters)
  • Four different constraints:
    • Single Instance - No configs
    • Allow Duplication - Every instance of the component is unique
    • Allow Override - The configuration closest to the host wins
    • Inheritable - Take configurations from parent and child
Components provide the content for managing your site. Anything that can be configured or used on your servers is a candidate for component integration. I have identified four possible types of components. Each of which exhibit different characteristics that effect development. Deciding on the constraint you want depends on the application. Not all constraints are capable of being implemented for all applications. But it is important to know constraints also effect the development of the component.
Server only means the application targets clients but does not run on the client. Client only means the application runs solely on the client. Client/Server means there are parts that run both on the client and the server. Interdependent is something I have thought about conceptually. The idea is two applications are some how interrelated. This relationship requires a specific sequence of actions take place on two or more clients. The canonical example is to configure a web server and have the firewall open up the appropriate port. Provided the web server is set to public. This could be an attribute that a component has rather than a type. But there isn't a lot of theory available about the topic to have a formal understanding.
Constraints are really only necessary for groups, unless you want to be able to duplicate your components. Single Instance is simple, that means there is only once instance that could be configured for a given host. The Allow Duplication constraint means a component can exist without conflict more than once on the same server. This is a harder constraint to achieve and is not always possible. A good example is implementing Awstats, each virtual host would have a separate component. The Allow Override constraint is an easy way to implement conflict resolution for groups. It simply means the component configuration closest to the host is used. Last is the idea of inheritable configurations. This is the toughest to implement. It means a component configurations come from both the parent component and the child component. This ultimately makes managing groups the easiest. I could define a bunch of stock configurations for a wide range of servers in a few groups. As I need to make adjustments to those components, I am only required to change what is different. Rather than having to reconfigure the entire component a second time. An example is to configure integrit for all a group that contained all your Solaris 9 servers. This would be a stock config for a base install of the OS. Some of these servers might have different applications that need different rules for how integrit should report what is changing. By adding the integrit component to one of these service based groups it would incorporate all of the rules from the Solaris 9 group plus the rules you just added.

Points:
  • Components are the content that make up your Configuration Management system
  • Identifying the type of component makes integration easier
  • The four different constraints provide group flexibility
  • Constraints are not necessary, but an implied resolution is needed

CM architecture - Components (cont.)

  • SysNav Strategy:
    • Implements everything but Interdependent types and the Inherit constraint
    • Restriction by OS
    • Leverages interface for Applications and configuration screens
    • Provides scheduled events that are cfrunable
The SysNav component provides the ability to extend the core framework. It provides hooks at every level. There a component can have applications, pidgets, configuration templates parsed by the Middle Layer, CMDB APIs, Configurations screens in the Client Manager, scheduled events and Cfengine code executed on clients. This provides an implementor with a with a complete array of options when integration an application.

Components sidebar

  • What components should you implement first?
    • Monitoring
    • Auditing
    • Security Hardening
    • SOE policies
    • Low hanging fruit
It is easy to forget why we need a CM system in the first place. First and foremost we need to automate everything to provide a consistent and known state of all our servers. This allows us to manage our servers proactively rather than reactively. There are a lot of components that need to be deployed to manage your servers effectively. There are plenty of books and practices for managing your servers. The CM system is just one part. Implementing monitoring, auditing, security hardening, and any SOE policies is a great place to start. Also it helps to implement anything that is easy, typically what is referred to as the low hanging fruit.
Points:
  • "The Practice of System and Network Administration" is a good source for where to begin

CM architecture - Proxy Nodes

  • Proxy nodes act as a substitute for the portal
  • Provide the following benefits:
    • Traverse firewalls
    • Scalability
    • Redundancy
  • Effects component development for scale and delivery
  • SysNav Strategy:
    • Managed by parent
    • Do not need to talk to parent (firewalls)
    • Handle all cfengine and rsync requests for children
    • Scaling inherent in design
Proxy nodes provide a lot of value depending on the size and type of organization. A proxy node is capable of handling all the requests given to the portal at the backend. Essentially it should be able to anything the portal can do minus the interface, APIs and CMDB. It the context of SysNav it will traverse firewalls, and provide scalability. Redunancy is planned into a future version of the product.

Points:
  • Proxy nodes help in a bunch of ways
  • Proxy nodes performs base server activity

CM architecture - Autonomous Agent

  • Running scripts
  • File delivery
  • Perform mundane tasks based on a compoents configuration
  • SysNav Strategy:
    • Write configs that target Cfengine
    • Configs are dynamically produced via templates
    • Templates are built from the CMDB
    • Configs handle all component activity
    • Build scripts that are idempotent and self healing
The autonomous agent we use is Cfengine. This tool is designed to perform configuration tasks on servers automatically. It is a higher level declarative language that allows you to express more in a tighter syntax. With SysNav and the structure we put around Cfengine, you can automate the delivery and execution of scripts, install and upgrade packages, managing configuration files, start and stop processes, and copy files back to the portal for viewing in the interface.
Points:
  • This is where the rubber meets the road for the CM system
  • The autonomous agent handles automating all the configured components

CM architecture - Models for automating

  • Incremental
  • Proscriptive
  • Mixing should be considered
  • Not all your servers will be managed by CM. Odd ball stuff prob is cheaper to do Ad hoc.
  • Transitioning work

Points:
  • Use these models when planning the Lifecycle of a service
  • Odd ball servers are a trap don't fall in it. (AIX story)
  • SysNav has a model for migrating tasks to a CM system

CM architecture - Models for automating (cont.)

  • The role of a junior adminstrator


Points:
  • The junior is not given root access
  • But has more power to manage servers through a controlled interface
  • Attempts to diagnose problems with the tool
  • If possible he tries to fix them
  • Unresolved problems are escalated

CM architecture - Models for automating (cont.)

  • The role of a senior adminstrator


Points:
  • The senior has root access
  • He has the power to implement components juniors can use in the interface
  • Ideally he delegates tasks that often need repeating to juniors with the interface
  • Only problems that cannot be resolved with the interface are kicked up to him

SysNav - Architecture

  • Tying it all together

SysNav - Demo

  • Client Manager
  • Bootstrapping
  • Solaris Patch Manager
  • cf.local

Conclusion

  • The field of System Configuration is young but advancing daily
  • Attempt to understand how the costs are involved for the different models
  • There are a lot of moving parts to a quality CM system
  • There are a lot of ways to implement give careful consideration to your needs
  • Get involved in the communities, read papers about CM
  • SysNav is a CM tool for System Administrators
  • I will put this presentation online
  • Please provide feedback

Resources

  • SysNav - http://www.sysnav.com/
  • Cfengine - http://www.cfengine.org/
  • cfwiki.org - http://www.cfwiki.org/
  • [1] lssconf - http://homepages.informatics.ed.ac.uk/group/lssconf/ (theory)
  • [2] config-mgmt - http://config.sage.org/ (practice)
  • [3] config-mgmt FAQ - http://config.sage.org/faq.html
  • [4] Why people don't adopt configuration management tools - http://homepages.informatics.ed.ac.uk/group/lssconf/config2004/slides/alva/workshop.pdf

Christian Pearce <pearcec@perfectorder.com>