Tuesday, December 16, 2014

Agent-based vs. Agent-less Monitoring

Frequently, I have been asked by customers regarding the difference between agent-based and agent-less monitoring solutions. There is a lot of confusion in this area, so this blog attempts to provide an explanation for each of these solution types, focusing mainly on the "paid" solutions. Open source solutions may be covered in a future article.

A bit of history

Since the early 90s, when the client-server architecture became very dominant in the market,
IT managers started to look for a solution to monitor these new assets. New type of tools had
to be introduced as the old monitoring tools suitable to the age of central computing (mainframe) were completely useless now.

Network Equipment

At that time, network equipment was monitored by various solutions that were based on the SNMP standard. Monitoring software from vendors such as Sun (Sunnet Manager), HP (OpenView) and IBM (Netview) used the early versions of SNMP agents incorporated into the network devices. 
Click to enlarge sample view of Correlated Network Resources
IBM Netview 6000

Such agents were called "monolithic". As the SNMP standard was further developed, more sophisticated agents were introduced. These agents were more flexible and allowed to be extended according to customer needs. These agents were called "extensible".

Servers and Applications

Servers (mostly versions of UNIX) had some minimal SNMP implementations that were quite limited in their capabilities. Since each server had its own resources (CPU, memory, disks, network interfaces etc.) it was critical to understand what is the status of each component of the architecture in order to identify possible faults and performance bottlenecks. Additional requirements such as real-time reading of system logs, running automatic actions either scheduled or as response to an issue, were also of great interest.

Windows servers had their own implementation of SNMP (limited as with UNIX servers). But interestingly enough, Microsoft had came up with a new proprietary protocol  (WMI) to allow agent-less remote management of Windows servers.

Oddly enough, using SNMP agents or WMI to manage servers and applications is still considered "Agent-less"


The introduction of proprietary system-based Agents

Following the increased market demand, software vendors such as CA, IBM and HP have quickly developed combinations of monitoring consoles and agents. Due to the weakness of the SNMP standards to provide a more comprehensive monitoring of operating systems and applications, these vendors introduced proprietary software for agents instead of relying on SNMP. 
HP Operations for Unix
Obviously, the major drawback for customers was that they must use the agent and console from a single vendor and cannot mix and match them. 

Agent-less Monitoring systems

During the early 2000s it was quite obvious that there is a place for cheaper/simpler monitoring solutions for mid-market customers. Smaller vendors have emerged and provided suites of products that utilized agent-less technologies such as SolarWinds, Paessler, Freshwater (later become HP SiteScope) and many others.
PRTG Enterprise Console

When vendors say "Agent-less" they actually mean: Native SNMP agent or protocols such as WMI, RSH, SSH or some other API the use to collect data from the server. Situations where you don't need to install other proprietary software.

When vendors say "Agent-based" they mean that you need to install their own software and use their console to manage your IT assets.  

Comparing Agent-less vs. Agent-based features 

Agent-based Agent-less Feature
No Yes Built-in to OS
$$$ Free
No Yes
Open protocols
Yes YesIn depth OS/App
LowLow-Medium Network Load imposed by monitoring
Medium-High Low
Impact on host OS
Medium-High Low
Deployment Effort
SNMP Only Yes Use 3rd Party Mgmt Console

Monday, November 10, 2014

DNS Configuration Best Practices for Network Management

In my previous post, "Why DNS is vital for Network Management success - Part 1" I've described the importance of a solid DNS system for successful network management.

In this post I will present the Best Practices of DNS configuration for Network Management

Typically, Microsoft servers and workstations are capable of self-registration to a DNS server which is part of an Active Directory server. Other IT assets may not have such functionality, therefore it is important to add a DNS record(s) for each IT asset that needs to be managed.

DNS Configuration Best Practices:

1. A device with a single network interface with a single IP address needs 2 record types:
  •     A record (also known as Forward Lookup) - from a hostname to an IP address
  •     PTR (also known as Reverse Lookup) from an IP address to a hostname
Example: A network switch that has no routing enabled (Layer 2 only)

Below is an example of how to configure both of the records for a new device via the Microsoft DNS
configuration tool:

2. A device that has multiple IP addresses (also called multi-homed) but communicates with the management system via only one of them - same as above.
Example: A server with several IP addresses.

3.  A device that has multiple IP addresses and communicates with the management system via several IP addresses (not advisable, but can happen) requires 2 record types for each IP address that is communicating with the management station.

Example: A router or firewall that uses one IP address for responding to SNMP queries but sends Syslog alerts and/or SNMP traps via another IP address.

How DNS can help with network troubleshooting ?

Some network engineers configure unique names for interfaces of routers to have a more read-able trace route output.

In cases where unique naming required per each IP address of a router, it is very common to use forward and reverse records for the main IP address (normally the loopback) and configure only reverse lookup (PTR) records for other interfaces that point to the name of the router.

Hop    (ms)    (ms)    (ms)             IP Address        Host name
1       17           0          0            8-1-18.ear1.dallas1.level3.net 
2       122       122       122     vlan70.csw2.dallas1.level3.net 
3       120       120       120     ae-73-73.ebr3.dallas1.level3.net 
4       *             *          *              - 
5       120       120       120      ae-2-2.ebr1.washington1.level3.net 
6       122       121       121    ae-91-91.csw4.washington1.level3.net 
7       121       121       121    ae-92-92.ebr2.washington1.level3.net 
8       120       120       121      ae-44-44.ebr2.paris1.level3.net 
9       120       120       120    ae-47-47.ebr1.frankfurt1.level3.net 
10     120       125       120        ae-71-71.csw2.frankfurt1.level3.net 
11      121       121       121     ae-2-70.edge4.frankfurt1.level3.net 
12     122       122       122   

Why would you need to configure 2 records for each IP address?

Most network management software systems insist on using reverse lookup to verify the IT asset names and IP addresses. Failing to have both records per device, can lead to the network management software performance issues such as long discovery and polling times, delays in processing SNMP traps etc.

Yigal Korolevski, KMC Technologies http://www.nms-guru.com

Friday, November 7, 2014

How Network Discovery Works - Part 1

How Network Discovery works - Part 1

In the early 90's, HP Network Node Manager has been one of the pioneers in utilizing a network discovery algorithm to identify network nodes and their topology. The discovery algorithm has been registered as a US Patent by HP.

Basically, in NNM 3.0 till NNM 7.x the discovery algorithm used to work as follows:
  • The NNM server own IP configuration was used to determine the discovery targets ( based on IP
    address, subnet mask and default gateway).
  • A ping sweep has been initiated to identify all responding IP addresses
  • All responding addresses were queried via SNMP to make sure they can communicate with NNM
  • All the nodes that responded to SNMP were queried for interesting tables:  System, ARP, Interfaces.
  • Node types and their vendor have been identified according to their response to SNMP system ObjectID (Router, Bridge, Computer) 
  • Nodes that were identified as Routers were queried for their routing tables.
  • Then the process of identifying new devices was repeated for any additional IP segment discovered by the previous step
Network administrators could control the network discovery by placing filters and various
limits to ensure it would not run forever and discover the entire Internet
(in 1995 it was still possible :) ).

The NNM map used to look like the picture below:

As networks became more complex and new technologies and protocols were added, this algorithm has been modified and adapted to handle VLANs, CIDR and various Layer 2 technologies, however it became too limited to handle newer technologies such as virtual interfaces and multi-link protocols (EtherChannel, 
MLT, SMLT etc) and also it was quite challenging to display the new topology using the outdated map based on old X/Motif technology (later ported to Windows as well).

In "How Network Discovery Works - Part 2" I will discuss the latest advances in this field

Yigal Korolevski, KMC Technologies http://www.nms-guru.com

Sunday, October 19, 2014

Why DNS is vital for Network Management success - Part 1

DNS (Domain Name System) has been around for many years now, but I still find many customer sites that use IP addresses as identifiers for their IT assets in their management systems.

Is it a bad thing ? Well, if your network is rather organized, IP conventions are in place and the IT organization consists of only a handful of team members, then go ahead and use IP addresses.

However, if you run a medium to large network with many different IT assets, an updated and well maintained DNS is a must. Names make it easier to identify the type of the device, its function and location and can make network troubleshooting much easier.

Here are some examples for device naming conventions (just to give you an idea):

rtr-nyc-r2.acme.com  =  Router in NewYork  in Rack 2
fw-dc01 =  Firewall, located in a Data Center numbered as 01
fw-dr01 = Firewall, located in a Data Center numbered as 01

sw-core-nyc01-fl2-r3 = Core switch in NewYork, 2nd Floor, Rack 3

Now, if you have several network management systems in place, DNS becomes even more important. Actually it serves as a "glue" that bonds these systems.

For example you may have a SolarWinds Orion NPM tool that monitors your network for availability and performance, while HP Operations  is the main console of your NOC. Alerts sent from Orion to HP Operations are  meaningful when you have correct DNS names and are understandable to any IT person (not just yourself).

NOC personnel can launch Solarwinds console by right-clicking on an icon or message in HP Operations and vice versa. Same names across the various management systems are allowing this magic to work.

In the second chapter on DNS I will discuss the actual configuration details of a DNS server.