Saturday, July 2, 2011

Voice IE: Mysterious phone registration issue

It's a hub-and-spoke topology - CUCM was at the HQ site while two branches BR1 and BR2 are connected to HQ router via Frame-Relay.

One of the phones on BR2 was not able to register to CUCM. The phone's screen displays "Registering".

Since the other phone on BR2 was able to registered, I thought the problem was specific to the phone. Thus I checked the phone configuration, reset the phone, restore it to factory default, re-located it to HQ site, etc.

The phone was able to register while it's on HQ. But it doesn't work while it's on BR2. So it seems to be site specific. However, if it's site specific, why the other phone on the same site could register?

Since the network path was "phone -> BR2 router -> Frame Relay -> HQ router -> CUCM", I focused on every elements on the path. The phone was able to get the correct IP config (subnet mask, default gw, TFTP) from DHCP. I wiped out the router configuration and reconfigure it. I deleted the phone from CUCM and re-added it. Reloaded CUCM and routers. Still no avail.

I pinged the BR2 phones from CUCM CLI. I can ping phone1 but not phone2. Then I realized it HAS TO be the network. But how come? For two hosts in the same subnet, if I could ping one but not the other, it HAS TO be the host's issue. But I already proved phone2 was working fine when it's on HQ site.

With the help of "debug ip packets detail", I found out that HQ router chose different paths for phone1 and phone2. But I didn't configure host routes. How could this happen?

"show ip route" discovered that BR1 was advertising same route as BR2. They both claimed to have the BR2 phones' subnet. Since they are equal cost routes, HQ router will load-balance the two routes (uses BR2 to reach phone1, then uses BR1 to reach phone2). That's why phone1 is always reachable while phone2 is always unreachable.

By reviewing BR2 router's config, I found out that I fat-fingered the IP address for the data VLAN (with the address of BR2 voice subnet).

This kind of mistake could easily cost you couple hours in the lab unless you had experienced it before and have some routing knowledge.

2 comments:

  1. This is very similar to configure an incorrect mask, and the IP of the affected phone falls into the range not covered by the subnet.
    But you did a thorough reasoning and the it was correct!
    This happens some times!

    ReplyDelete
  2. The tricky part was:
    1) the problem only affects half of the phones.
    2) to fix the problem, I need to modify the config on a 'non-relevant' component.

    ReplyDelete