triple redundancy

zaax · 19 May 2017

When building an important computer system triple redundance is built in. ie three processors normal working together but one can do the job of three if the others break down.

Is this the same with the railway signalling system?

RailUK Forums

ComUtoR · 19 May 2017

You sneeze and the system breaks.

Leaf mulch can prevent a track circuit from operating.

I think you need to define further about 'redundancy' being built in. The system will tend to fail safe so when a set of points fail they fail on a specific direction and that will interlock with other points and signals. That is a form of redundancy I suppose.

Joseph_Locke · 19 May 2017

zaax said:
When building an important computer system triple redundance is built in. ie three processors normal working together but one can do the job of three if the others break down.

Is this the same with the railway signalling system?

In SSI there are three duplicated systems - three entirely separate Zilog Z80B processors in fact - and two of these must agree before the system takes action. In this case a complete failure of one processor subsystem can be tolerated, but not two.

I'm not sure how CBI achieves its SIL compliance - you should remain on the platform and await the arrival of an interlocking technologist.

MarkyT · 19 May 2017

The successors to SSI are Westlock and Smartlock and these also use a similar technique but with more modern hardware. Not sure about other suppliers, although SSI and it's derivatives almost completely dominate the mainline market. Note the triple redundancy only applies to the central interlocking modules, not the distributed trackside function modules (TFM) that have only duplicated processors. A failure to agree in a TFM will result in a signals at red shutdown for that module but will only affect the small quantity of trackside equipment that one module interfaces to. No more than two signals or two sets of points usually.

edwin_m · 19 May 2017

SSI actually has hardware to shut down the processor that disagrees with the other two and if only two remain to shut down both if they disagree. This is done by blowing a fuse I think. The data links between the processor and the trackside are also duplicated for reliability, but the data on each link is digitally encoded so it won't do anything unsafe if it is corrupted or, for example, connected to the trackside equipment for a different interlocking.

Bletchleyite · 19 May 2017

zaax said:
When building an important computer system triple redundance is built in. ie three processors normal working together but one can do the job of three if the others break down.

Is this the same with the railway signalling system?

Provided it fails safe, there isn't really much need other than for reliability - "all trains stop now" is a safe if annoying failure outcome.

An Airbus's control systems are different - the plane can't just stop if there is a failure.

Edit: Though other more knowledgeable posters on this matter have confirmed it is indeed used.

edwin_m · 19 May 2017

Bletchleyite said:
Provided it fails safe, there isn't really much need other than for reliability - "all trains stop now" is a safe if annoying failure outcome.

An Airbus's control systems are different - the plane can't just stop if there is a failure.

Edit: Though other more knowledgeable posters on this matter have confirmed it is indeed used.

Indeed. In aviation it's probably more important that a system keeps working even if it isn't quite doing the right thing - the pilot can usually compensate. For a railway signalling system if it can't be guaranteed to do the right thing it shouldn't do anything. Doing the wrong thing may not be evident to anyone until too late, and could have catastrophic results.

snowball · 19 May 2017

I assume it's no use trying to triplicate the unit that compares the outputs of the triplicated processors.

MarkyT · 19 May 2017

edwin_m said:
SSI actually has hardware to shut down the processor that disagrees with the other two and if only two remain to shut down both if they disagree. This is done by blowing a fuse I think. The data links between the processor and the trackside are also duplicated for reliability, but the data on each link is digitally encoded so it won't do anything unsafe if it is corrupted or, for example, connected to the trackside equipment for a different interlocking.

It's a kind of enhanced Mexican standoff in classic SSI. Each of the three MPMs (main processor modules) has some built in hardware for checking it's own output in comparison to the two others. If the outputs differ the minority module attempts to blow it's own fuse if it's capable. The other two modules are also hardwired to be able to kill the dissenting module and both will attempt this if they detect a disagreement. The two remaining modules, if they survive the shoot out, are capable of running the railway alone but alarms are generated to summon technical assistance, as if any further disagreement is detected by either module, both are eliminated and the interlocking shuts down. Signals go to red. Points are immobile.

Central Interlocking - Duplicated for safety, triplicated for reliabiliy.

Trackside Datalinks - Duplicated for reliability. A and B links are also preferably fed along the trackside from nodes at opposite ends of the scheme so even if both cables are cut midway all trackside objects remain connected to the interlocking by one or the other link. Similar architecture is preferred for equipment power supplies.

Trackside Function Modules (Distributed I/O) - Duplicated for safety alone. As to reliability, failure of one module in an interlocking otherwise fully functional is considered acceptable.

Other Control Centre Equipment - Systems usually contain duplicated boards or modules for reliability.

rf_ioliver · 19 May 2017

zaax said:
When building an important computer system triple redundance is built in. ie three processors normal working together but one can do the job of three if the others break down.

Quick rough answers:

If you are specifically referring to redudancy, then in the simplest system all three processors will perform the same job and then voting circuitry will ensure that the "correct" result is given in the case one processor fails.

You can get redundancy in other ways, eg: providing over capacity as might be done in certain "cloud" scenarios etc. Anyway, there are many, many forms of system redundancy. I think in your example however you're referring to a parallel processing system in which load can be shared - that's different case to fault-tolerance which is really the case for railways.

There are a number of issues in such systems, this article gives a good overview of the most famous of these: https://en.wikipedia.org/wiki/Byzantine_fault_tolerance

Then someone mentioned avionics and fail-safe. In a simple system, eg: a signal, then if something fails then the system fails to a safer state, eg: if a yellow bulb fails in a 4-aspect signal then either no signal is shown or only a single yellow is shown, both of which are more restrictive then a double yellow.

You can get variations of this, eg: circuitry which ensures that a red is shown for any failure.

The main point here is to get a system to fail *gracefully*. For example, in Airbus the general principle is that the system under failure returns more and more control, gracefully, to the pilot, eventually leaving the pilot with full control. And yes, Airbus aircraft can be flown fully manually.

One interesting point to note is that in such systems pretty much everything is done to avoid the system giving up and handing all control over to a human at once.

Big topic to discuss, if you have anything specific let me know by PM or reply here,

t.

Ian

mark-h · 19 May 2017

Bletchleyite said:
An Airbus's control systems are different - the plane can't just stop if there is a failure.

I think Airbus have the critical soft/firmware programmed by 3 different companies to reduce the risk of a bug causing an issue.

Three processors running the same software would give the same (wrong) responce if there was a code issue.

najaB · 19 May 2017

mark-h said:
I think Airbus have the critical soft/firmware programmed by 3 different companies to reduce the risk of a bug causing an issue.

Three processors running the same software would give the same (wrong) responce if there was a code issue.

That does then introduce the problem of three different but equally valid solutions to the same set of inputs, initial condition and desired outcome. For example, if you're at FL20 and want to go to FL25 one program might calculate the minimum time solution while another calculates minimum fuel and the third does something somewhere in between.

I don't know about Airbus, but in other applications the three computers have to run identical software for the majority voting system to work. There will be a fourth, completely isolated backup system running a different code stack which takes over in the case of a system freeze/crash on the main computers.

Edit to add: Whatsmore, three different code stacks makes bugs *more* likely rather than less as that's three times as much code to test and validate.

asylumxl · 19 May 2017

Joseph_Locke said:
In SSI there are three duplicated systems - three entirely separate Zilog Z80B processors in fact - and two of these must agree before the system takes action. In this case a complete failure of one processor subsystem can be tolerated, but not two.

I'm not sure how CBI achieves its SIL compliance - you should remain on the platform and await the arrival of an interlocking technologist.

Amazing how prolific Zilog Z80 variants still are!

edwin_m · 19 May 2017

asylumxl said:
Amazing how prolific Zilog Z80 variants still are!

Now you mention it, they are actually Motorola 6800 variants in SSI not Z80 - even older I think!

asylumxl · 20 May 2017

edwin_m said:
Now you mention it, they are actually Motorola 6800 variants in SSI not Z80 - even older I think!

Makes perfect sense I suppose. Stability is one of the most important qualities of an embedded system and the software/hardware combo must be pretty stable by now!

Tim M · 20 May 2017

The WESTRACE Computer Based Interlocking (now Siemens Trackguard WESTRACE) system uses single processor using a true and complementary system to achieve safety to Safety Integrity Level 4. The Mk2 system has redundancy capabilities by duplicating both the interlocking module and the various Input and Output modules, with as the link below says hot swap capabilities.

WESTRACE has been in service in many countries around the world for about 25 years.

https://www.mobility.siemens.com/mo...-interlockings/trackguard-westrace-mk2-en.pdf

triple redundancy

zaax

Member

RailUK Forums

ComUtoR

Established Member

Joseph_Locke

Established Member

MarkyT

Established Member

edwin_m

Veteran Member

Bletchleyite

Veteran Member

edwin_m

Veteran Member

snowball

Established Member

MarkyT

Established Member

rf_ioliver

Member

mark-h

Member

najaB

Veteran Member

asylumxl

Established Member

edwin_m

Veteran Member

asylumxl

Established Member

Tim M

Member