• Our booking engine at tickets.railforums.co.uk (powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

Cambrian line 20 Oct 2017: loss of ERTMS speed restrictions. RAIB report released

Status
Not open for further replies.

deep south

Member
Joined
24 Jul 2012
Messages
74
And back to the original point; the root cause is a significant error in the system design. There should have been a "handshake" to confirm "message was received and processed". Similar to a letter, proof of posting is not proof of delivery.
 
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

Railsigns

Established Member
Joined
15 Feb 2010
Messages
2,488
There was no interlocking between the barriers and the signals.

Yes, there was (it wasn't possible to clear the protecting signals with the barriers up).

What was missing at Moreton-on-Lugg was approach locking, which made it possible for the signalman to throw back the signal in front of the approaching train and then immediately raise the barriers.
 

carriageline

Established Member
Joined
11 Jan 2012
Messages
1,897
More modern interlockings will normally “make” them fail safe, and replace something behind them to red. But that won’t always be enough in some scenarios

But it does make me laugh how someone can say something that should be displaying an aspect telling a driver to stop, is showing nothing. it would normally be protecting something in front of it! (Remember, that could be a T3, line blockage protecting staff, an incident, track defect, failure etc, it’s extreme but always possible!
 

Llanigraham

Established Member
Joined
23 Mar 2013
Messages
6,074
Location
Powys
Yes, there was (it wasn't possible to clear the protecting signals with the barriers up).

What was missing at Moreton-on-Lugg was approach locking, which made it possible for the signalman to throw back the signal in front of the approaching train and then immediately raise the barriers.

Considering that I worked a Box with exactly the same problem along the same line, and was the signaller on duty when we were the first Box inspected by the RAIB/ORR after the incident, I know exactly what was and was not possible.
That was a very "interesting" and fraught afternoon!
 

Tomnick

Established Member
Joined
10 Jun 2005
Messages
5,827
Considering that I worked a Box with exactly the same problem along the same line, and was the signaller on duty when we were the first Box inspected by the RAIB/ORR after the incident, I know exactly what was and was not possible.
That was a very "interesting" and fraught afternoon!
I worked a box (actually more than one), albeit on a different line, also with the same problem, and the problem wasn’t that there was no interlocking between the crossing and its protecting signals! What was missing was indeed approach locking. The former was provided, presumably when the box opened in 1890, by the Midland Railway in the form of tumbler locking. A simplified form of the latter was hurriedly fitted after Moreton-on-Lugg because, as at M-o-L, it was possible to put the protecting signals back and immediately raise the barriers even with a train closely approaching at linespeed. It wasn’t, also as at M-o-L, possible to pull off without first lowering the barriers nor to raise the barriers without first replacing the protecting signals, because there was interlocking between the two functions.
 

MarkyT

Established Member
Joined
20 May 2012
Messages
6,232
Location
Torbay
Approach locking of signals was and is still not a universal feature of mechanical locking in Absolute Block areas, especially at smaller boxes that have been less heavily modernised over the years. Typically it requires a B position electric lock on the signal lever concerned, or "back-lock" as it's sometimes known, that prevents the lever from going fully back to the normal position for a time, so preventing the release of any mechanical interlocking applied to other levers in the frame such as level crossing locks, points and FPLs, and other directly opposing signals. Alternatively a point or crossing lock lever might have a timer applied to it's full N or R position electrical lock (according to which lever position locks the gates closed in the case of a level crossing). Some larger boxes still extant, or surviving until very recently, along the WCML were heavily modernised in the 1960s with new colour light TCB layouts having full electrical controls applied to their mechanical frames equivalent to modern relay interlockings, including sectional route locking and release, approach locking etc.
 

Tomnick

Established Member
Joined
10 Jun 2005
Messages
5,827
I shall just say that theory didn't always work in the real world!
Which part of the theory didn't work at either your box or at Moreton-on-Lugg? The protecting signals at each should have been interlocked with the crossing, either through the mechanical locking (so that the barrier release/locking - as appropriate - lever is locked when any of the levers operating protecting signals are reversed, and the protecting signals can't be cleared until the barrier release/locking lever is in the correct position) or electrically (with, ultimately, the same effect). If the interlocking had failed, then there was a very serious wrong-side failure (but nothing in the report, in my understanding or in the nature of the controls subsequently introduced indicates that this was the case).

What was missing was something to hold the barriers down if the protecting signal was replaced to danger before a train had passed.
 
Joined
10 Feb 2016
Messages
92
Which part of the theory didn't work at either your box or at Moreton-on-Lugg? The protecting signals at each should have been interlocked with the crossing, either through the mechanical locking (so that the barrier release/locking - as appropriate - lever is locked when any of the levers operating protecting signals are reversed, and the protecting signals can't be cleared until the barrier release/locking lever is in the correct position) or electrically (with, ultimately, the same effect). If the interlocking had failed, then there was a very serious wrong-side failure (but nothing in the report, in my understanding or in the nature of the controls subsequently introduced indicates that this was the case).

What was missing was something to hold the barriers down if the protecting signal was replaced to danger before a train had passed.

Indeed.
I have wrorked quite a few crossing boxes which (thankfully) had approach locking and in the rare circumstances where you need to put back before the train has passed, that 2 minute time-off seems a hell of a lot longer when you have irate motorists glaring up to the box at you!
 
Last edited:

Tomnick

Established Member
Joined
10 Jun 2005
Messages
5,827
Indeed.
I have wrorked quite a few crossing boxes which (thankfully) had approach locking and in the rare circumstances where you need to put back before the train has passed, that 2 minute time-off seems a hell of a lot longer when you have irate glaring up to the box at you!
The additional controls fitted at one of my boxes after M-o-L worked simply by starting a timer when the plunger was pressed to clear a protecting signal (whether the lever was actually reversed or even free to be reversed!) and holding the barrier locking lever reverse until either a train had passed over the treadle outside the box in the relevant direction or the system eventually timed out. It took something like four or five minutes to time out. All that it took was a momentary brain fart, you’d got the barriers down and went for the plunger for the Up instead of the Down, and that was it...
 

Chris M

Member
Joined
4 Feb 2012
Messages
1,057
Location
London E14
The RAIB have released in interim report about this incident. It doesn't seem to contain much new, but it seems that the investigation is coming along slowly because they are having to reverse engineer the software and then try and break it in various ways to simulate what actually happened as the data was deleted when trying to get the system running again.
https://www.gov.uk/government/news/...ss-of-speed-restrictions-on-the-cambrian-line
 

MarkyT

Established Member
Joined
20 May 2012
Messages
6,232
Location
Torbay
Perhaps NR should consider reintroducing some new form of AWB at braking distance from the start of temporary restrictions in ETCS areas, accompanied by an additional passive balise encoded with the speed and distance by the engineer applying the restriction: e.g. "30kph starts 500m for 100m". The onboard ETCS equipment on the train would then use this information to override any higher speed transmitted in movement authorities over the radio. Would entirely avoid the need for this additional signalbox GEST system and the signaller duties associated with it.
 

Muzer

Established Member
Joined
3 Feb 2012
Messages
2,773
...Which, to me, is completely crazy. How can safety critical software be that poorly-documented?

Perhaps they (wrongly) treat GEST as non-safety-critical?
 

AngusH

Member
Joined
27 Oct 2012
Messages
551
The report doesn't make happy reading, that's for sure and some of the details, if I've understood them correctly, are rather troubling.

I'm particularly puzzled that it seems possible to delete logging data so easily. This really should be recorded to some type of archiving system, and shouldn't be easily deleted (Ideally it should be permanently recorded)

I'm also concerned about having a procedure that involves removing and then somehow manually re-entering temporary speed limits.
This seems most dubious.

It also appears that the manufacturer is being asked to do the analysis. I expect that there is some additional oversight of this process by a technically knowledgeable but independent person. Allowing the manufacturer to do the analysis is sufficient for determining cause, but doesn't seem sufficient if there is a possibility that they might be at fault.


If it hasn't already been done, I think perhaps that the RAIB should call in an external expert of stature, such as a professor of computer science or software engineering from a serious university to do a complete and independent analysis of the system and its design?
An independent analysis obviously has the advantage that if they are found not to be at fault, it cannot be suggested that they affected the outcome in any way.
 

Crossover

Established Member
Joined
4 Jun 2009
Messages
9,247
Location
Yorkshire
The report doesn't make happy reading, that's for sure and some of the details, if I've understood them correctly, are rather troubling.

I'm particularly puzzled that it seems possible to delete logging data so easily. This really should be recorded to some type of archiving system, and shouldn't be easily deleted (Ideally it should be permanently recorded)

I'm also concerned about having a procedure that involves removing and then somehow manually re-entering temporary speed limits.
This seems most dubious.

It also appears that the manufacturer is being asked to do the analysis. I expect that there is some additional oversight of this process by a technically knowledgeable but independent person. Allowing the manufacturer to do the analysis is sufficient for determining cause, but doesn't seem sufficient if there is a possibility that they might be at fault.


If it hasn't already been done, I think perhaps that the RAIB should call in an external expert of stature, such as a professor of computer science or software engineering from a serious university to do a complete and independent analysis of the system and its design?
An independent analysis obviously has the advantage that if they are found not to be at fault, it cannot be suggested that they affected the outcome in any way.

I haven't read the report (yet) but understand the gist of the issue and I can't disagree with anything you've mentioned. For what is such a safety critical incident, one doesn't really want the manufacturer having the ability to "window dress" the findings in their favour
 

Muzer

Established Member
Joined
3 Feb 2012
Messages
2,773
This really is quite concerning. I'm in the software industry myself - thankfully nowhere near any safety-critical stuff - and I'm aware of how in some industries safety-critical software is very poorly-regulated (see the incidents with mid-2000s Toyotas and the evidence given against them here as an example). I never really looked into it but I was hoping this wouldn't be the case in the rail industry. From the looks of this preliminary report, though, it's not looking good. I really hope I'm wrong.

And yes, RAIB absolutely need to be employing someone who is an expert in analysing such systems, and they need to be given full access to everything including source code to audit.
 

muz379

Established Member
Joined
23 Jan 2014
Messages
2,206
I see from the RAIB interim report that once the failure was detected a safeguard was brought into use by way of a test train being used to verify new TSR's . Out of interest does anyone know how long this was in place for or is it still in place ?

In the past when I have heard of situations when a driver has alleged a wrong side failure of a signal showing them a less restrictive aspect than it should have been at the time and the signal in question has been brought out of use until such a time that destructive testing can take place (usually end of service) . But in this case that clearly would have caused massive disruption so was not the case . Is that not operational convenience being placed above safety ?
 
Joined
7 Jan 2009
Messages
859
Not a happy picture:
  1. three drivers went through the affected section before the fault was notified, apparently not noticing that there were no TSR indications (even though they are safety critical, not least because of overspeed protection)
  2. no apparent safety documentation available for the GEST
  3. coming up for a year after the incident and RAIB, at least, unable to conclude what actually happened, and
  4. the system still in use, albeit with manual cross-checking when TSRs are uploaded following resets or for new ones (but could the fault occur without a reset having occurred?)

A bit more urgency needed? How could the GEST have been accepted into service with an apparent absence of documentation?
 

deep south

Member
Joined
24 Jul 2012
Messages
74
So the "workaround" to a system issue is to manually re-enter all the necessary information, with a much higher risk of error? For a safety critical system, this is not very good at all.
 

Chris M

Member
Joined
4 Feb 2012
Messages
1,057
Location
London E14
The RAIB report into this incident has been released today. I've not read it yet, but from the summary:
The temporary speed restriction data was not uploaded during an automated signalling computer restart the previous evening, but a display screen incorrectly showed the restrictions as being loaded for transmission to trains. An independent check of the upload was needed to achieve safety levels given in European standards and the system designer, Ansaldo STS (now part of Hitachi STS), intended that this would be provided by signallers checking the display. A suitable method of assuring that the correct data was provided to the display had not been clearly defined in the software design documentation prepared by Ansaldo STS and the resulting software product included a single point of failure which affected both the data upload and signallers’ display functions. The system safety justification was presented in a non-standard format based on documentation from another project still in development at the time of the Cambrian ERTMS commissioning and which, before completion, made changes that unintentionally mitigated the single point of failure later exhibited on the Cambrian system. Network Rail and the Independent Safety Assessor (Lloyd’s Register Rail, now Ricardo Rail/Ricardo Certification) were required to review the design documentation but did not identify the lack of clear definition in design documents and were not aware of the changes made during the development of the other project.

Recommendations
The investigation makes five recommendations. Network Rail, aided by the wider rail industry, should improve its safety assurance process for high integrity software-based systems and improve safety learning from failures of such systems, and develop a process to capture the data needed to understand these failures. Hitachi STS (formerly Ansaldo STS) should review its safety assurance processes in the light of the learning from this investigation, and should provide a technical solution for the Cambrian lines that avoids the need for signallers to verify automatically uploaded speed restrictions.

Learning points cover train drivers reporting inconsistencies in information provided to them; the need for Independent Safety Assessors to understand the scope of checks undertaken by other bodies and to apply extra vigilance if documents form part of a non-standard process; the importance of clients undertaking their client role when procuring high integrity software; and achieving the specified level of safety when implementing temporary speed restrictions in ERTMS.
Link to the report: https://www.gov.uk/government/news/...al-signalling-data-on-the-cambrian-coast-line
 

MarkyT

Established Member
Joined
20 May 2012
Messages
6,232
Location
Torbay
I think there's definitely a case for a hard-coded passive temporary balise placed for each restriction, with an advanced warning board, at the braking point for the restriction by the engineer putting the restriction on and programming the balise. There's no argument then. Such a system could coexist with the temporary data in the central system system and each would act as a crosscheck against the other.
 

RichardGore

Member
Joined
15 Jul 2018
Messages
36
Location
Coulsdon
I think there's definitely a case for a hard-coded passive temporary balise placed for each restriction, with an advanced warning board, at the braking point for the restriction by the engineer putting the restriction on and programming the balise. There's no argument then. Such a system could coexist with the temporary data in the central system system and each would act as a crosscheck against the other.

Is there really that much benefit to running both systems in parallel as opposed to simplifying the RBC and operations at the signalling centre and only using temporary balises?
 

Belperpete

Established Member
Joined
17 Aug 2018
Messages
1,581
Is there really that much benefit to running both systems in parallel as opposed to simplifying the RBC and operations at the signalling centre and only using temporary balises?
I agree about not providing a temporary balise - if the ERTMS is working properly, then it should be protecting the restriction. But what happens in degraded mode, when the RBC has shut-down and trains are being signalled manually as described in the report - are the drivers expected to remember the temporary restrictions? Surely warning boards should be provided to cover such situations?
 

MarkyT

Established Member
Joined
20 May 2012
Messages
6,232
Location
Torbay
Is there really that much benefit to running both systems in parallel as opposed to simplifying the RBC and operations at the signalling centre and only using temporary balises?

I would agree with that sentiment. Duplication in this way could introduce new failure modes when one record conflicts with the other. Make the RBC and its data much simpler and encode all the static data about the infrastructure in the permanent lineside balises as well. After all, being static the data only changes when the infrastructure itself changes when the extra work of reprogramming the balises is not a great additional overhead. Means even when the RBC was down and backup signalling methods were in operation, all civil speed restrictions could still be accurately followed. I'm liking this decentralised approach.
 

Belperpete

Established Member
Joined
17 Aug 2018
Messages
1,581
To be fair, this is 2 years old now
Presumably, lessons have been learnt...
The most worrying thing about this report to my mind is that the RAIB have had to make a recommendation that the underlying fault (that a non-safety system is being used for a safety-critical function contrary to the standards) should be fixed. I would have expected to see some mention that work was already under way to address this - the absence of that I find very concerning.
 

Belperpete

Established Member
Joined
17 Aug 2018
Messages
1,581
I would agree with that sentiment. Duplication in this way could introduce new failure modes when one record conflicts with the other. Make the RBC and its data much simpler and encode all the static data about the infrastructure in the permanent lineside balises as well. After all, being static the data only changes when the infrastructure itself changes when the extra work of reprogramming the balises is not a great additional overhead. Means even when the RBC was down and backup signalling methods were in operation, all civil speed restrictions could still be accurately followed. I'm liking this decentralised approach.
But do the balises have any effect when the backup signalling methods are in operation? Presumably the ERTMS is disconnected in the cab at such times.
 
Status
Not open for further replies.

Top