Netwatcher

August 1998 Volume 16.8


Netwatcher (ISSN 0890-5800) is a monthly publication of CIMI Corporation. Subscription information is available here . Copyright © 1998, CIMI Corporation. All rights reserved. No publication or reproduction of this document is permitted without the express written consent of CIMI Corporation.


Management Briefing

Management Briefing

LAN equipment vendors have been promoting a lot of new concepts to try to make up for the downward spiral in prices and profit margins.

We talk about higher speeds and higher OSI level switching in a later article in this issue. Here, we want to introduce a feature that is less directly linked to LAN switching and thus harder for many users to relate to. It’s the area of policy management and directory-enabled networking (DEN). Both seem to promise to bring some order to the LAN and internetwork, order that many managers would welcome. But bringing order to LANs is a challenge, both because most organizations have lost control of the applications users run, and because LAN technology is somewhat disorderly by nature.

LANs, as connectionless networks, provide neither a way of controlling the way that traffic is admitted to the network, nor a way of allocating resources to any given traffic flow. That can create problems for organizations who have resource limits (not necessarily everywhere, but at least in some critical places like the LAN/WAN boundary). Lack of admission control and priority management makes congested LANs into a gambling proposition—will the stuff that gets impacted be the least, or most, important part of the traffic picture?

The umbrella approach to providing some order to connectionless chaos is "policy management". The implementation strategies differ among vendors, but policy management systems provide users a way to create rules on how traffic is to be handled in their network. These rules are then enforced by the network hardware, and provide some assurance that if network resources become scarce, key applications will be the least impacted. Policy management rules can also sometimes govern the behavior of the network in failure modes, insuring that fail-over procedures preference critical applications.

What’s Inside Policy Management/DEN?

Policy management filters have been a part of routers for ages. They usually take the form of a flow mask that defines the characteristics of the packets to which the rule applies, and a rule set that provides a definition of how things that meet that flow mask criteria are handled.

Rule sets of this type, based as they are on packet header data, can be applied easily by any device that handles the packets. But despite this benefit, they aren’t particularly popular for a number of reasons:

    1. Many LANs use dynamic address assignment (DHCP, for example) and this could reduce the value of header-policy systems because the addresses on which the policies are based are changeable over time.
    2. Some of the information that users would like to base policy on, such as organizational affiliation, may not be readily inferred by examining packet headers. There’s no "organization" field to examine, for example.
    3. The process of matching rules to data in every device that handles flows can be a tremendous waste of resources, and add delays to processing that would impact application performance.

A solution to this problem, at least in theory, is the use of the directory process to control the policy enforcement—hence the DEN name. The theory is that by linking all of the directories that are used in conjunction with policy management, address resolution, etc. (presumably via LDAP initially, and eventually by providing a unified directory), all the data that is known about a client or resource is concentrated in one place. At the time the client-to-resource linkage is made (when the client asks for a DNS decode of a server’s logical name, for example), the DEN system would apply all the rules associated with handling the relationship. The packets would be tagged with the result of the policy process, and only this handling instruction would have to be examined by the nodes in the network.

A good example of this is the support for a priority service level for mission-critical applications. The client would hit the DNS to get the IP address of the mission-critical application server. At that point, the server would deliver handling instructions to a kind of policy proxy, based on the application of policy rules that could examine the identity of the requesting client and the requested resource in some detail, using any and all directory data known for each. This policy proxy would then set type of service (TOS) bits to indicate a high-level or ordinary-level handling priority on the packets, as the policy statements require.

DEN clearly has some benefits. The network, after all, has only a limited number of handling capabilities. It has to be easier to make each node aware of them (which would have to be the case if the nodes were going to do the handling) than to make the nodes individually aware of the rules for selecting one option or the other. The use of a central directory to store both rules and characteristics makes the process of resource allocation and policy management easier.

Is There a Dark Side?

DEN also has some issues associated with it:

    1. The user has to maintain the policy rules and the directory values correctly, or exceptionally weird and bad things happen. Most users we’ve surveyed have had problems with the management of these databases.
    2. The mechanism chosen for the rules to be applied, and the results linked back to the source to set handling instructions in packets, has to be foolproof. Some systems might cache IP addresses internally, for example, and defeat policy management by not using DNS.
    3. The handling options selected by the rules have to be supported through the whole scope of the network, or at least through the places where congestion would be possible and which must therefore handle data with priority in mind.

DEN is likely to be extended from handling priority to security and access rights. As this kind of extension occurs, it is more critical that the process of DEN be secure and stable. Thus, it becomes critical that the policy proxy be placed at the source and destination systems to insure that the handling options are properly set. Placed there, the proxy could also insure that no cached addresses bypassed the policy system, because the proxy could check to insure that every incoming packet matched some rule it was running, even if the rule said "do nothing with this packet". The cooperation of Cisco and Microsoft in DEN is clearly aimed in part at insuring this is the case.

Conclusion

The technical issues associated with DEN can be resolved, and the idea of having a policy proxy is a good one in any case (3Com really initiated this with its Transcend architecture, though it rarely gets any credit for the fact because it mismanaged the PR). What is less clear is whether the other issues associated with DEN value can be.

LAN congestion is clearly solvable by either management of priority or by increasing resources. The low cost of LAN switching has opened the second choice, one that isn’t yet an option for WANs. Thus, the policy management/DEN value proposition may be limited to applications that have to transit WAN devices. If that is the case, it isn’t clear whether having a far-reaching policy control process is justified, especially if a large number of the relationships under control will never transit a congested resource and thus never benefit from policy control.

There is also the truth that in a very dynamic user environment, the administrative burden of DEN would be difficult to bear, particularly in organizations who admit that line departments buy pretty much what they want, and run it on the LAN when it suits them. You can’t order that kind of environment, no matter how good the tools may be.

DEN is a good technology, but it’s something that we need to be looking at carefully if it’s promoted by a vendor, consultant, or inside expert. The impact of DEN in an administrative sense could be formidable (every new application would have to be linked with a policy or it would fall into the "let it happen" default anarchy state), and the benefits might not be all that impressive.


In the Know

In the Know

One of the many destructive consequences of mindless market hype is that it often disguises real issues. Since these issues aren’t ever brought forward, key events are missed and opportunities lost.

Convergence, meaning the supposed re-invention of public network infrastructure along IP lines, is one of those hype concepts. The latest casualty of this broad-based illusion may be the recent SS7-oriented acquisitions by Cisco and Ascend.

The media didn’t give much play to either the Ascend purchase of Stratus, or the Cisco acquisition of Summa Four. SS7 tends to be interesting today to the extent that it either supports the convergence hype or the voice over IP space. Since both Cisco and Ascend had already made VoIP announcements and were assumed to be in the convergence business, no news here.

The media may be right in this case. Just because vendors have lots of money and lots of technical humans doesn’t make them insightful planners. But the acquisitions may also signal a sea change in the telephone switching market for a reason having nothing to do with either convergence or IP.

If you're interested in our views on this topic, you'll have to consult Tom Nolle's column in Network World, because this section of Netwatcher is for subscribers only, this month!


Strategies

Strategies

LAN switching has changed the premises network market in many ways, some of which we’ve covered in prior pieces on the vendor impacts of the rapidly shifting scene.

One way that directly impacts the user is in the area of network design. We all know how to build shared-media LANs, and how to network them together. Do we have similar confidence in our skills in building switched LANs? If so, that confidence is probably misplaced according to our research, because about a third of switched LANs installed today are non-optimal and a fifth experience significant problems.

The LAN market today is being driven by a new wave of features on switches and in management systems in an attempt to drive up margins in a space where competition and other factors are thinning them down considerably. Will this new stuff solve our problems, or simply add more confusion? That’s what we’ll consider here this month.

The Old Way

The original LANs were just so simple and logical that you have to want to yearn for their return. Users tapped into a fat Ethernet cable that went to the host and shared that cable among themselves. Traffic was so sparse that colliding was as likely as alien abduction, and handling it didn’t tax even 8080-based computers.

Shared-media LANs allow a number of stations to tap into a single connecting media and use it cooperatively. The cooperation may involve random access (carrier sense multiple access with collision detection, the Ethernet approach) or "mother-may-I" token passing as FDDI or Token Ring use. In either case, media speed was fixed, stations all attached at that speed, and increasing either traffic or stations resulted in some form of push-back that normalized throughput.

Connecting these LANs together was a job for bridges or routers. These devices might be used to link local LANs or remote LANs. In both cases, most of the traffic tended to stay on a LAN rather than move through the interconnect device, because most LANs were used to share local resources like disks or printers.

The expansion in traffic on shared-media LANs created congestion that was solved originally by segmentation; the LAN was broken up into two or more smaller LANs whose members were selected based on how they interacted with common resources. The objective was always to include in a LAN a community of workers and resources that were substantially interdependent rather than dependent on those outside the community. This gave rise to the so-called "80-20" rule that 80% of traffic is local to the LAN and 20% goes off-net to another LAN or location.

As traffic grew on LANs, the problem of segmentation bit users twice. First, they found that the segment sizes were getting smaller and smaller to permit the total traffic to stay within media limits. This created an increased demand for internetworking hardware to link the segments, a costly proposition. The second bite was that the "micro-segmentation" as it was called tended to make it impossible to create segments containing most user-to-resource interactions, so the off-net traffic component increased. This often created off-net activity of enough volume to congest backbone LANs, so users had to move to FDDI backbones to get relief.

All of this eventually gave rise to the concept of LAN switching, where a fast LAN virtual media (the backplane or matrix) was speed-matched to user ports. The LAN switch allowed a very large number of users to fit on a "LAN" or LAN segment because virtual media in the gigabit speed range was hard to congest.

With LAN switching, there was no longer a need for micro-segmentation. LANs could be relatively large, even hundreds of users per LAN/segment. It would seem that the problems of the world were ended.

They weren’t, of course. That was only the beginning.

The Switch Problem

Two factors contributed the poisoned apple to our switched LAN paradise; inertia and greed.

For users, LAN switches were the next step in an evolving LAN topology ballet, and virtually every user adopted LAN switching by sticking a switch in to replace the hubs that had been used to create the last LAN segmentation. Thus, most switched LANs were already partially segmented and not optimally designed when the switches were introduced. Rather than go back to the original segment structure, the user codified the latest round of micro-segments.

Then there was greed, on the part of the vendors. No market develops fast enough to feed the hungry mouths of product managers’ families. No market develops fast enough to meet the heady expectations contained in analyst forecasts. Thus, something had to be done to fuel additional deployment in cases where there wasn’t enough traffic growth to force switching adoption.

One strategy was the introduction of 100Base or Fast Ethernet. In truth, the number of client systems who need 100 Mbps LAN capacity is very small, particularly if we assume switched LANs. But the introduction of 100Base LANs provided an opportunity to churn out some additional hubs and NICs. Switches also allowed speed-matching of 10Base and 100Base users, so incremental use of 100Base technology for key clients and servers was possible.

Gigabit Ethernet was the next play, and the media and analyst community hyped this market mercilessly; one analyst report had a 2002 market size so big it would have meant one Gigabit Ethernet port would be sold that year for every other desktop!

Elevating speed is good, but elevating layers was also good. Level 2 LAN switching is the rule, but vendors introduced Level 3, then Level 4, and even a few Level 5s. Never mind the fact that anything above Level 3 is inconsistent with the OSI model.

The effect of all of this was to confuse the buyer and obscure the real issue set. As a result, LAN switching has been delivered primarily at the workgroup level, and the exceptions tend to be organizations that are high-tech in nature. Clearly some better approach to the problem is needed.

Models for LAN Switching

We have to accept the fact that LAN switches are already being bought and installed—at the workgroup level for the most part, as we’ve said. The fact that this is already happening means that the first switch LAN model we should look at is the model that accommodates in-place workgroup switches. That is the hierarchical model.

In the hierarchical model of LAN switching, workgroups or groups of workgroups are supported on a LAN switch, or as per-subnet VLANs, depending on whether the users are all members of one subnet. This structure exists totally at Level 2, and it is interconnected with a second tier of switches that provide Level 3 switching.

If the workgroup users connect at 10 Mbps, the trunk from tier one to tier two should be run at 100 Mbps. This becomes the port speed for the second-tier switches, which then require trunks at the gigabit level. That rule of "trunk equals ten times port" is a way of insuring that random congestion doesn’t occur at the points of interconnection. This is needed because losing shared media loses that flow backpressure mechanism to limit traffic, and we could overrun switch buffers as a result.

If the organization is big enough to require it, there may be a third-tier switch or switches that are gigabit products, using both gigabit trunks to the second tier and gigabit trunks among themselves. These products, like all higher-tier switches, are probably Level 3 switches.

The hierarchical model is a logical way of building switched LANs when there are divided procurement policies and where low-level switches will probably lead the charge. It’s also a good model for multi-vendor applications, because the layers can be used to separate Level 2/3 functions, and the use of Level 3 switching tends to buffer differences in vendor.

A second model of LAN switching is the "collapsed" model. In this model, workgroup switches are linked among themselves using Gigabit Ethernet trunks. This linkage eliminates the need for a second switching tier. Because most Level 2 network models couldn’t scale or provide error recovery in this configuration, the workgroup switches must offer both Level 2 and Level 3 features, with the former used within VLANs/subnets and the latter between them, and between switches. Going to gigabit trunking here is necessary because the traffic has fewer switches to pass, and thus fewer buffers to collect in, so congestion can’t be tolerated or there’ll be data loss.

The collapsed model of switched LAN is a good model for applications where central purchase control insures an orderly and somewhat simultaneous migration to switching. It doesn’t tolerate vendor differences among the switch products well, so it probably won’t accommodate in-place workgroup switches.

In either model, it’s a good idea to think over the low-level topology before you make the connections. If the current LAN structure is the result of one or more phases of segmentation, it might be useful to move backward to a Level 2 structure that approximated the original LAN. In most cases, users have too small a population of subnet users on their LANs. If the change to switched LANs is coincident with an adoption of IP addressing, this is a good time to restructure the subnets. Once you’ve started assigning addresses, there’s a lot of incentive to preserve the subnet structure as is.

Servers and Switched LANs

There are a lot of really interesting approaches to server location proposed by vendors. Many are non-optimal and some are a recipe for disaster.

Workgroup servers should generally be located on the workgroup switch and attached if possible using the same media speed as the clients. This will reduce the risk that the server will overrun the slower client interface, resulting in lost packets. If you have to increase server speed to avoid congestion on the server’s LAN port, use a switch that will provide some form of MAC-layer flow control to push back on the server if the switch buffers are exhausted. Sometimes having multiple ports is better than having a faster one if you don’t have flow control.

When servers are centrally "farmed" rather than distributed, the problem of server attachment is trickier. Again, it’s a good idea to assume the server and client attachment speeds should be the same. It is particularly important to insure that no low-speed trunks exist between fast clients and fast servers, again to avoid the buffer-overrun problem.

Servers always present a problem to switched LAN designers because the speed of a server is sometimes limited by congestion on the adapter connecting it to the network. This makes users upspeed these connections on the assumption that it can’t hurt, which isn’t true. If the LAN server is used for short transactional activity, making its LAN connection super-fast won’t hurt in most cases. Where the server is likely to blast kilobits of data to a single client in a rush, a fast server connection can overrun the LAN switch buffers if the client is slower, or if there’s congestion. If you are using this kind of server, check your utilization before upspeeding.

If applications provide some mechanism for window size management, problems with server overrun of buffers can be corrected. Using a small window size will stop the sender from transmitting and filling buffers, if earlier traffic is already being held for congestion reasons.

Increasing buffer sizes in the switches may also help, but most switch vendors don’t provide options for that. As an alternative, a router can be used to provide matching between the server area and the client systems. Routers almost always provide buffer expansion options. If you pick this approach, try to site the router close to the point of constriction; near the 100-to-10 Mbps conversion in our example. Otherwise, the buffers will be too far away to provide effective protection.

Sometimes the only solution to relieving server congestion is to make the server faster. If that’s the case, you should expect to upspeed clients as well. Believe it or not, 100Base all around is often less troubling than a mixture because of the buffer problems speed-matching can present. This isn’t a recommendation for gigabit servers and clients, though; most of the time, that’s not necessary.

ATM Switched LANs

ATM presents its special issues as well. Most of our readers know that we believe ATM LANs won’t be common in the US market at any point, owing to the success of the very LAN switching we’re talking about. With premises bandwidth cost plummeting, bandwidth conservation architectures like ATM are less useful.

Where ATM does have a play is in applications where there is no value to Level 3 switching and some measure of reliability is required in the second-tier switching structure. The use of ATM at the second tier allows rerouting of connections without Level 3 features, something most likely to be useful in big Token Ring networks, but which may also be valuable in applications where there are many types of LAN protocols being run.

ATM can also be useful when a single set of servers must support both Token Ring and Ethernet clients, because a LANE server and the proper software on the application server can allow a single NIC to support both types of client architectures.

A final interesting application of ATM on LANs is the use of MPOA and route servers to support the integration of two or more colliding IP address spaces on a single LAN. Occasionally mergers or other business activities will create a problem where two different users of RFC 1918 happen to have selected the same address range, and then consolidated infrastructure.

Remember that using ATM switching to link workgroup switches works best if the workgroup products have an ATM trunk option. It’s possible to employ ATM in either model of switch deployment if this is the case.

Also remember that the marriage of ATM and non-ATM LANs, essential to support the second-tier missions we’ve outlined, is dependent on vendor implementation of the ATM interfaces, and the way that LAN services are mapped to ATM virtual circuits. Check carefully before you buy.


Down the Line

Down the Line

 

In our next issue we’re going to take a look at the migration of SNA networks to something other than SNA—like IP. We’ll talk about the drivers, the benefits, the tools, and the burdens.


- NETWATCHER Index Page

Access the index of CIMI Corporation's recent newsletters.