Monday, 26 August 2013

Fibre channel long distance woes

Fibre channel long distance woes

I need a fresh pair of eyes.
We're using a 15km fibre optic line across which fibrechannel and 10GbE is
multiplexed (passive optical CWDM). For FC we have long distance lasers
suitable up to 40km (Skylane SFCxx0404F0D). The multiplexer is limited by
the SFPs which can do max. 4Gb fibrechannel. The FC switch is a Brocade
5000 series. The respective wavelengths are 1550,1570,1590 and 1610nm for
FC and 1530nm for 10GbE.


Optics
Here is the power output of one port summarized (collected using sfpshow
on different switches)
SITE-A units=uW (microwatt) SITE-B
**********************************************
FAB1
SW1 TX 1234.3 RX 49.1 SW3 1550nm (ko)
RX 95.2 TX 1175.6
FAB2
SW2 TX 1422.0 RX 104.6 SW4 1610nm (ok)
RX 54.3 TX 1468.4
What I find curious at this point is the asymmetry in the power levels.
While SW2 transmits with 1422uW which SW4 receives with 104uW, SW2 only
receives the SW4 signal with similar original power only with 54uW.
Vice versa for SW1-3.
Anyway the SFPs have RX sensitivity down to -18dBm (ca. 20uW) so in any
case it should be fine... But nothing is.
Some SFPs have been diagnosed as malfunctioning by the manufacturer (the
1550nm ones shown above with "ko"). The 1610nm ones apparently are ok,
they have been tested using a traffic generator. The leased line has also
been tested more than once. All is within tolerances. I'm awaiting the
replacements but for some reason I don't believe it will make things
better as the apparently good ones don't produce ZERO errors either.
Earlier there was active equipment involved (some kind of 4GFC retimer)
before putting the signal on the line. No idea why. That equipment was
eliminated because of the problems so we now only have:
the long distance laser in the switch,
(new) 10m LC-SC monomode cable to the mux (for each fabric),
the leased line,
the same thing but reversed on the other side of the link.


FC switches
Here is a port config from the Brocade portcfgshow (it's like that on both
sides, obviously)
Area Number: 0
Speed Level: 4G
Fill Word(On Active) 0(Idle-Idle)
Fill Word(Current) 0(Idle-Idle)
AL_PA Offset 13: OFF
Trunk Port ON
Long Distance LS
VC Link Init OFF
Desired Distance 32 Km
Reserved Buffers 70
Locked L_Port OFF
Locked G_Port OFF
Disabled E_Port OFF
Locked E_Port OFF
ISL R_RDY Mode OFF
RSCN Suppressed OFF
Persistent Disable OFF
LOS TOV enable OFF
NPIV capability ON
QOS E_Port OFF
Port Auto Disable: OFF
Rate Limit OFF
EX Port OFF
Mirror Port OFF
Credit Recovery ON
F_Port Buffers OFF
Fault Delay: 0(R_A_TOV)
NPIV PP Limit: 126
CSCTL mode: OFF
The problem is the 4GbFC fabrics are almost never clean. Sometimes they
are for a while even with a lot of traffic on them. Then they may suddenly
start producing errors (RX CRC, RX encoding, RX disparity, ...) even with
only marginal traffic on them. I am attaching some error and traffic
graphs. Errors are currently in the order of 50-100 errors per 5 minutes
when with 1Gb/s traffic.
Forcing the links on 2GbFC produces no errors, but we bought 4GbFC and we
want 4GbFC.

I don't know where to look anymore. Any ideas what to try next or how to
proceed?
If we can't make 4GbFC work reliably I wonder what the people working with
8 or 16 do... I don't assume that "a few errors here and there" are
acceptable.
Oh and BTW we are in contact with everyone of the manufacturers (FC
switch, MUX, SFPs, ...) Except for the SFPs to be changed (some have been
changed before) nobody has a clue. Brocade SAN Health says the fabric is
ok. MUX, well, it's passive, it's only a prism, nature at it's best.
Any shots in the dark?

No comments:

Post a Comment