• Our booking engine at tickets.railforums.co.uk (powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

Cambrian line 20 Oct 2017: loss of ERTMS speed restrictions. RAIB report released

Status
Not open for further replies.
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

daikilo

Established Member
Joined
2 Feb 2010
Messages
1,623
Perfectly reasonable, but making a tightly-scoped "sync verification" module a mandatory part of the normal operating sequence, would at least cause a right-side failure during a desync, no matter what unexpected scanario caused it.

I would go a step further, no reboot/restart should ever lead to a situation where functionalities have been lost without it being apparent. As for the theory that complication makes it impossible, this is rubbish or alarming, every software sequence should be an ordered step and none should ever be missed without a fault signature and probably restart failure message.
 

OneOffDave

Member
Joined
2 Apr 2015
Messages
453
I would go a step further, no reboot/restart should ever lead to a situation where functionalities have been lost without it being apparent. As for the theory that complication makes it impossible, this is rubbish or alarming, every software sequence should be an ordered step and none should ever be missed without a fault signature and probably restart failure message.


For complex, tightly coupled systems it is impossible to model every possible interaction in the system. It's not just about the software but how the software interacts with the real world and what impacts that has. If you feel that experts in the field are spouting rubbish, why not get your own research published in peer reviewed journals
 

daikilo

Established Member
Joined
2 Feb 2010
Messages
1,623
For complex, tightly coupled systems it is impossible to model every possible interaction in the system. It's not just about the software but how the software interacts with the real world and what impacts that has. If you feel that experts in the field are spouting rubbish, why not get your own research published in peer reviewed journals

I should have been more specific in that I was refering to safety-critical rail signalling software, the subject of this thread, and also to the specific case of start-up or reboot. If complication is added to the point where safe working cannot be ensured then steps should be simplified even if it then takes longer to operate.
 

OneOffDave

Member
Joined
2 Apr 2015
Messages
453
I should have been more specific in that I was refering to safety-critical rail signalling software, the subject of this thread, and also to the specific case of start-up or reboot. If complication is added to the point where safe working cannot be ensured then steps should be simplified even if it then takes longer to operate.

Yes, I agree there should be some method of checking that volatile information survives update and upgrade processes.
 

YorkshireBear

Established Member
Joined
23 Jul 2010
Messages
8,692
This is just a string of words without meaning.
No it is not.

It is quite clear what i means. Each sequence should happen in an order and if any of them fail or do not happen it should be obvious via either a fault signature or a restart failure message. Which would have prevented this incident.
 

dtaylor84

Member
Joined
14 Apr 2013
Messages
128
OK, that clears up the "ordered step" part.

I'm still not really sure what set of "software sequences" this argument applies to, or what it means for something to "be obvious via a fault signature" (other than by reporting an error message, which is mentioned separately and is presumably something distinct.)
 

carriageline

Established Member
Joined
11 Jan 2012
Messages
1,897
I imagine that no one would of thought it was possible, hence why it happened!

The signallers are given a list of speed restrictions on a display. That’s programmed to be updated by the RBC/SCT/whatever the Cambrian use for speed restrictions. For some reason it wasn’t.

It’s not like this was something that was just not thought about. Ok yes, the system should be more robust. But how can you work out every possible fault if it hasn’t happened yet?

IIRC in the RAIB preliminary statement, it said the manufacture hadn’t even found out why it happened.
 

nickswift99

Member
Joined
7 Apr 2013
Messages
273
There are relatively new approaches to risk management that ought to apply here but were almost certainly too new for this rollout.

A systems approach will enable you to identify previously unknown/unexpected faults. STAMP is an example, for which academic papers can be found here http://sunnyday.mit.edu/
 

daikilo

Established Member
Joined
2 Feb 2010
Messages
1,623
I imagine that no one would of thought it was possible, hence why it happened!

The signallers are given a list of speed restrictions on a display. That’s programmed to be updated by the RBC/SCT/whatever the Cambrian use for speed restrictions. For some reason it wasn’t.

It’s not like this was something that was just not thought about. Ok yes, the system should be more robust. But how can you work out every possible fault if it hasn’t happened yet?

IIRC in the RAIB preliminary statement, it said the manufacture hadn’t even found out why it happened.

Railway signalling system has been considered fail-safe for over a century. It had weaknesses like fog and snow but these weren't hidden. No "fault" should be hidden and no-one should ever be forced to say "it failed-unsafe and we don't know why". One could argue that the whole system should have been shut-down in case an/other hidden failure case/s had also occured during that reboot.
 

Dieseldriver

Member
Joined
9 Apr 2012
Messages
974
Railway signalling system has been considered fail-safe for over a century. It had weaknesses like fog and snow but these weren't hidden. No "fault" should be hidden and no-one should ever be forced to say "it failed-unsafe and we don't know why". One could argue that the whole system should have been shut-down in case an/other hidden failure case/s had also occured during that reboot.
100% agree. From a Drivers perspective we rely implicitly on signal aspects, safety systems/indications and signage (as well as our own extensive knowledge which can only be so much). A modern system behaving in this way is actually pretty worrying and suggests that the system in use on the Cambrian is unreliable for the safe running of trains.
This time it was relating to a Temporary Speed Restriction but how are we to trust this system given that it's primary function is to stop trains bumping into each other at high speeds?
 

Dave1987

On Moderation
Joined
20 Oct 2012
Messages
4,563
I imagine that no one would of thought it was possible, hence why it happened!

The signallers are given a list of speed restrictions on a display. That’s programmed to be updated by the RBC/SCT/whatever the Cambrian use for speed restrictions. For some reason it wasn’t.

It’s not like this was something that was just not thought about. Ok yes, the system should be more robust. But how can you work out every possible fault if it hasn’t happened yet?

IIRC in the RAIB preliminary statement, it said the manufacture hadn’t even found out why it happened.

I find it incredibly worrying that they don't know why it happened. When people's lives are at risk it's absolutely not acceptable to say "this fault has never happened before so how can we have put things in place to stop it happening". If it doesn't 'fail safe' like everything does currently on the railway currently then it's very very concerning.
 

HSTEd

Veteran Member
Joined
14 Jul 2011
Messages
16,745
I find it incredibly worrying that they don't know why it happened. When people's lives are at risk it's absolutely not acceptable to say "this fault has never happened before so how can we have put things in place to stop it happening". If it doesn't 'fail safe' like everything does currently on the railway currently then it's very very concerning.
Everything on the railway does not always fail safe.

Occasional Wrong Side failures are a fact of life
The important thing is to work out why this failure happened and remove the vulnerability.
 

Dave1987

On Moderation
Joined
20 Oct 2012
Messages
4,563
Everything on the railway does not always fail safe.

Occasional Wrong Side failures are a fact of life
The important thing is to work out why this failure happened and remove the vulnerability.

I would like you to sight an example of something on the railway that does not fail safe. You clearly know of something else you would not make statements like that.

Imagine this failure had happened with a train operating under ATO where drivers route knowledge had been cut to the bone like some are proposing and it had been over some dodgy track. There you have the perfect recipe for a huge accident. Things like this show the weaknesses of systems like this. I have a fair amount of experience with coding and know that you can have bugs in a system that lay unseen for years until they rear their ugly heads.
 

HSTEd

Veteran Member
Joined
14 Jul 2011
Messages
16,745
I would like you to sight an example of something on the railway that does not fail safe. You clearly know of something else you would not make statements like that.

Well the obvious example is Clapham Junction in '88
Fail Safe systems are designed to fail safe, but like all engineered systems they occasionally fail to perform their designed function.
Imagine this failure had happened with a train operating under ATO where drivers route knowledge had been cut to the bone like some are proposing and it had been over some dodgy track. There you have the perfect recipe for a huge accident. Things like this show the weaknesses of systems like this. I have a fair amount of experience with coding and know that you can have bugs in a system that lay unseen for years until they rear their ugly heads.

The driver would have been over the route dozens of times under ATO control anyway, and it is likely he would have noticed something was wrong before the accident anyway - as the train failed to brake in the manner that it normally did.
 

ComUtoR

Established Member
Joined
13 Dec 2013
Messages
9,460
Location
UK
The driver would have been over the route dozens of times under ATO control anyway, and it is likely he would have noticed something was wrong before the accident anyway - as the train failed to brake in the manner that it normally did.

I can't speak for the specifics but there are plenty of routes that I rarely go over and it is very easy to go 6 months without going over a specific route. It can also be a case where a Driver goes over a route for the first time since signing it etc. etc.
 

Dave1987

On Moderation
Joined
20 Oct 2012
Messages
4,563
Well the obvious example is Clapham Junction in '88
Fail Safe systems are designed to fail safe, but like all engineered systems they occasionally fail to perform their designed function.

Well I actually thought you were going to quote an incident that had happened in the last decade that I had not heard about.

The driver would have been over the route dozens of times under ATO control anyway, and it is likely he would have noticed something was wrong before the accident anyway - as the train failed to brake in the manner that it normally did.

Do you understand how TSR's and ESR's work? You are talking about one TSR that had been in for a very long time that the driver knew about. What if this was for a 20mph TSR over a bit of dodgy track that had only come in the previous day and the driver had been on holiday? You could end up with a train doing line speed through a severe speed restriction which is extremely dangerous. This kind of thing is the prime reason there will a driver at the front with full route knowledge and full training.
 

HSTEd

Veteran Member
Joined
14 Jul 2011
Messages
16,745
Well I actually thought you were going to quote an incident that had happened in the last decade that I had not heard about.
Well there may have been one, but Clapham Junction was merely the first example of how any engineered system will inevitably fail eventually that came to my head

Do you understand how TSR's and ESR's work? You are talking about one TSR that had been in for a very long time that the driver knew about. What if this was for a 20mph TSR over a bit of dodgy track that had only come in the previous day and the driver had been on holiday? You could end up with a train doing line speed through a severe speed restriction which is extremely dangerous. This kind of thing is the prime reason there will a driver at the front with full route knowledge and full training.

How does full route knowledge protect against that, if they haven't been told about the TSR how on earth are they going to divine it from their route knowledge?
You could provide the driver a list at the start of shift of all the extent TSRs, and have a track mileage counter visible to the driver in the cab
 

DY444

Member
Joined
16 Sep 2012
Messages
138
I would like you to sight an example of something on the railway that does not fail safe. You clearly know of something else you would not make statements like that.

Imagine this failure had happened with a train operating under ATO where drivers route knowledge had been cut to the bone like some are proposing and it had been over some dodgy track. There you have the perfect recipe for a huge accident. Things like this show the weaknesses of systems like this. I have a fair amount of experience with coding and know that you can have bugs in a system that lay unseen for years until they rear their ugly heads.

There have been incidents where systems which were thought to be fail safe but turned out not to be. One I can think of was on the Washington Metro where a track circuit module failed in such a way that it failed to detect a train resulting in a fatal collision. I can think of others in the UK which were less serious but they have happened very occasionally.
 

carriageline

Established Member
Joined
11 Jan 2012
Messages
1,897
Wrong Side Failures are still an occurrence (IE one every 12 months?)

It’s mostly signals showing aspects they shouldn’t, or track circuits not occupying when they shouldn’t. It happens
 

Llanigraham

On Moderation
Joined
23 Mar 2013
Messages
6,103
Location
Powys
I can't speak for the specifics but there are plenty of routes that I rarely go over and it is very easy to go 6 months without going over a specific route. It can also be a case where a Driver goes over a route for the first time since signing it etc. etc.

But not on the Cambrian! They are up and down it day in, day out.
 

Llanigraham

On Moderation
Joined
23 Mar 2013
Messages
6,103
Location
Powys
Wrong Side Failures are still an occurrence (IE one every 12 months?)

It’s mostly signals showing aspects they shouldn’t, or track circuits not occupying when they shouldn’t. It happens

Quite!!
And for a more recent example, I cite Moreton on Lugg.
 

bramling

Veteran Member
Joined
5 Mar 2012
Messages
17,776
Location
Hertfordshire / Teesdale
The driver would have been over the route dozens of times under ATO control anyway, and it is likely he would have noticed something was wrong before the accident anyway - as the train failed to brake in the manner that it normally did.

This statement is extremely naive.

Firstly I love the use of the word "likely". The railway doesn't do things based on what's "likely" to happen (or not happen).

Secondly there's absolutely no guarantee at all that the driver would have been over the route many times at all - it could for example be his first trip back after a lengthy period of leave.

Also it's well known that with ATO systems drivers are less likely to react to things as it takes time for them to re-focus.
 

Wilts Wanderer

Established Member
Joined
21 Nov 2016
Messages
2,493
For an up to date example of a wrong side failure, look at the VTEC HST that had an external door open unexpectedly at 125mph a few days ago.

Systems should be designed to fail safe, but not all failure-prone objects on the railway are a system. Engineering is as much about good judgement as it is about compliance with rules and standards. This is where the modern railway and Network Rail frighten me. It is increasingly all about compliance and less about common sense and critical judgement.
 

Chris M

Member
Joined
4 Feb 2012
Messages
1,057
Location
London E14
Well I actually thought you were going to quote an incident that had happened in the last decade that I had not heard about.
Waterloo?
Cardiff East Junction?
Watford tunnel?
Broad Oak level crossing, Kent?

That's just from RAIB reports published in 2017.
 

Bald Rick

Veteran Member
Joined
28 Sep 2010
Messages
29,220
Wrong Side Failures are still an occurrence (IE one every 12 months?)

It’s mostly signals showing aspects they shouldn’t, or track circuits not occupying when they shouldn’t. It happens

Signalling Wrong Siders are much more frequent than that. Mostly TCs showing clear when occupied (usually rail or wheel contamination), but AWS bell vice horn is quite common also, and signals showing a less restrictive aspect than they should have, or a ‘wrong’ junction indicator are not unknown. Much more rarely points throwing the wrong way or similar - the Waterloo derailment in August was sone of these.

In any case such events are risk scored, and those with a score over 50 are the ones to be really worried about. There were 102 in 2015/16. https://www.networkrail.co.uk/who-w...rformance/infrastructure-wrong-side-failures/
 

cjmillsnun

Established Member
Joined
13 Feb 2011
Messages
3,254
Well there may have been one, but Clapham Junction was merely the first example of how any engineered system will inevitably fail eventually that came to my head

CLJ was not an engineered system that failed. It was human error. No ifs and buts. That failure was caused by rogue wires not being cut back after changes to the system.
 

Bald Rick

Veteran Member
Joined
28 Sep 2010
Messages
29,220
CLJ was not an engineered system that failed. It was human error. No ifs and buts. That failure was caused by rogue wires not being cut back after changes to the system.

It was still a wrong side failure.

The human error was somebody not doing their job properly, by not completing a wiring task, and it not being properly checked.

Change the word ‘wiring’ for ‘software’, and you have a possible cause of the ETCS failure.
 

cjmillsnun

Established Member
Joined
13 Feb 2011
Messages
3,254
It was still a wrong side failure.

The human error was somebody not doing their job properly, by not completing a wiring task, and it not being properly checked.

Change the word ‘wiring’ for ‘software’, and you have a possible cause of the ETCS failure.

No arguments that it was a wrong side failure.

A wire count is a much simpler task than deciphering millions of lines of code but you are correct that one simple error can cause a dangerous situation.
 
Status
Not open for further replies.

Top