ECML/MML major power problems (09/08)

Nicholas Lewis · 4 Jan 2020

Greybeard33 said:
From the description of the rationale for the software change, it appears that the permanent lockout for low frequency was effectively a regression error in the 3.27.x software, albeit at the specification level rather than in the implementation:

This does not reflect well on Siemens' change control processes.

:

Indeed and I would have thought the software would have been locked, certainly for safety critical functions, at its configuration version when the train was authorised by ORR and thus Siemens shouldn't up version it without re approval. I do appreciate that the train management systems do a load of other functionality these days and many reliability issues are stemming from software issues that need to be quickly fixed but surely safety critical software needs far more care.

RailUK Forums

edwin_m · 4 Jan 2020

Nicholas Lewis said:
Indeed and I would have thought the software would have been locked, certainly for safety critical functions, at its configuration version when the train was authorised by ORR and thus Siemens shouldn't up version it without re approval. I do appreciate that the train management systems do a load of other functionality these days and many reliability issues are stemming from software issues that need to be quickly fixed but surely safety critical software needs far more care.

Not sure either Siemens or ORR would be very happy with that, going by the number of versions these things are said to go through. This problem actually arises from a change to requirements not to the software itself. At the very least the change should have been subject to safety review before implementation, considering not just the possibility of damage to the electronics but also the operational consequences.

Facing Back · 4 Jan 2020

The following thoughts are not based on experience in the rail industry, but in software development in a variety of different industries and the public sector....

Software is rarely "locked" these days, but updates to live systems are typically managed through a rigorous change management/control process - which includes assessing the need (often a change in requirements from the users of the software) then testing the hell out of both the new features and making sure that they haven't broken any existing functionality - the latter can take much more effort than the former. Also the process to deploy the new software is heavily tested.

Mistakes do happen of course - I don't know whether the permanent train lock in the new version of the software was a requirement which was incorrectly designed, or the software not delivering the documented requirements. My experience in software development (30+ years now) is that poor requirements are now more often the problem - testing generally picks up when the software doesn't deliver to those requirements.

The fact that the trains are not all running on the same version of the software is not automatically a cause for concern. My experience is that it is fairly common. If the new software release was to correct a safety critical defect of course then you would expect it to be deployed across the fleet.

hwl · 4 Jan 2020

Facing Back said:
...that poor requirements are now more often the problem - testing generally picks up when the software doesn't deliver to those requirements....

Completely agree on this being a big issue.

edwin_m · 4 Jan 2020

Facing Back said:
Mistakes do happen of course - I don't know whether the permanent train lock in the new version of the software was a requirement which was incorrectly designed, or the software not delivering the documented requirements. My experience in software development (30+ years now) is that poor requirements are now more often the problem - testing generally picks up when the software doesn't deliver to those requirements.

From this quote...

In order to prevent the situation arising where a driver is unaware of a safety-related fault or out-of-specification condition and carries out a battery reset with the potential to worsen the situation, Siemens decided to implement a “permanent lock-out” in software version 3.27.x that could not be cleared by a battery reset. Such a permanent lock-out would require a technician to attend the train and perform an analysis of the causes of the lock-out before clearing it. Siemens identified a range of trigger conditions for the permanent lock-out, intended to be those conditions that could be made worse by clearing a lock-out that had been imposed. Among the conditions selected was the detection of a low power supply frequency.

...it would appear to be a conscious decision by someone in Siemens to specify this functionality as a requirement, rather than a mistake in implementation of requirements. As I've probably said somewhere in this thread already, the software people then proceeded to implement the wrong thing extremely well.

Facing Back · 4 Jan 2020

That looks very plausible to me although I'll be slightly surprised if Siemens get to make that kind of functionality decision alone without agreement by the TOC/Rosco/DFT/NR.

w1bbl3 · 4 Jan 2020

The permanent lock out introduced by the specification for version 3.27.x appears have been intended to cover situations where resetting without diagnosis could result in damage to the train which would then have been a problem from Siemens to fix. The lock out in event of transient low power supply frequency being implicit to the specification requirement, I'd suspect this occurring in transient situations was unintended. I'd expect that as maintenance is the responsibility of Siemens they would have been able to introduce such a software modification without needing direct approval from GTR/DFT/NR under guise of reliability improvements.

Facing Back · 4 Jan 2020

w1bbl3 said:
The permanent lock out introduced by the specification for version 3.27.x appears have been intended to cover situations where resetting without diagnosis could result in damage to the train which would then have been a problem from Siemens to fix. The lock out in event of transient low power supply frequency being implicit to the specification requirement, I'd suspect this occurring in transient situations was unintended. I'd expect that as maintenance is the responsibility of Siemens they would have been able to introduce such a software modification without needing direct approval from GTR/DFT/NR under guise of reliability improvements.

That sounds plausible too. Even for a bug which was Siemen's responsibility to fix I'd still expect a crowd to sign off on the design - they do in other regulated environments. However if the consequences were not anticipated or intended then it wouldn't have been in the design specs for anyone to worry about. Its disappointing that it wasn't picked up in regression testing (ie we haven't broken anything which currently works) but without knowing what is in the detail design/use cases I guess its hard to know for sure.

edwin_m · 5 Jan 2020

Per my quote above it looks like a specific requirement was set to lock out permanently at low supply frequency, so the testing would have been to confirm that happened as "intended".

hwl · 5 Jan 2020

edwin_m said:
Per my quote above it looks like a specific requirement was set to lock out permanently at low supply frequency, so the testing would have been to confirm that happened as "intended".

Exactly failure to engage brains enough on the requirements at an early stage.
To many people thinking happy inside the box thoughts and not enough negative outside of the box thinking being done initially.

Thankfully software all properly sorted and rolled out ASAP.

Software is giving most train manufactures nasty existential thoughts currently...

Surreytraveller · 5 Jan 2020

However did we manage to run trains without computers?

duffield · 5 Jan 2020

My opinion is that even if there are circumstances where it's appropriate to prevent the driver resetting the system off their own bat, there should always be the *option* of a remote reset, either directly (if it's possible for the technician to communicate remotely with the relevant train) or indirectly (by the technician supplying a 'reset code' for the driver to type in).
This is how my burglar alarm works if a fault develops - if the technician on the phone agrees the circumstances are appropriate, they'll give me a reset code; if not they will schedule a visit.

In this particular case, the technicians might well have needed to visit a few trains physically to diagnose the issue but as soon as they realised it was fine for the drivers to reset the system without train damage, all the other drivers could have been given reset codes; this would have considerably reduced the length of the disruption.

D365 · 6 Jan 2020

Surreytraveller said:
However did we manage to run trains without computers?

Hammers

Nicholas Lewis · 6 Jan 2020

duffield said:
My opinion is that even if there are circumstances where it's appropriate to prevent the driver resetting the system off their own bat, there should always be the *option* of a remote reset, either directly (if it's possible for the technician to communicate remotely with the relevant train) or indirectly (by the technician supplying a 'reset code' for the driver to type in).
This is how my burglar alarm works if a fault develops - if the technician on the phone agrees the circumstances are appropriate, they'll give me a reset code; if not they will schedule a visit.

In this particular case, the technicians might well have needed to visit a few trains physically to diagnose the issue but as soon as they realised it was fine for the drivers to reset the system without train damage, all the other drivers could have been given reset codes; this would have considerably reduced the length of the disruption.

Good point and this is the sort of issue i would have thought ORR would have pontificated on. Im pretty sure if these had been 345's marooned in Crossrail Tunnels they wouldnt have been so quiet

ECML/MML major power problems (09/08)

Nicholas Lewis

On Moderation

RailUK Forums

edwin_m

Veteran Member

Facing Back

Member

hwl

Established Member

edwin_m

Veteran Member

Facing Back

Member

w1bbl3

Member

Facing Back

Member

edwin_m

Veteran Member

hwl

Established Member

Surreytraveller

On Moderation

duffield

Established Member

D365

Veteran Member

Nicholas Lewis

On Moderation