• Our booking engine at tickets.railforums.co.uk (powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

ECML/MML major power problems (09/08)

Status
Not open for further replies.

Nicholas Lewis

Established Member
Joined
9 Aug 2019
Messages
6,093
Location
Surrey
From the description of the rationale for the software change, it appears that the permanent lockout for low frequency was effectively a regression error in the 3.27.x software, albeit at the specification level rather than in the implementation:


This does not reflect well on Siemens' change control processes.

:
Indeed and I would have thought the software would have been locked, certainly for safety critical functions, at its configuration version when the train was authorised by ORR and thus Siemens shouldn't up version it without re approval. I do appreciate that the train management systems do a load of other functionality these days and many reliability issues are stemming from software issues that need to be quickly fixed but surely safety critical software needs far more care.
 
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

edwin_m

Veteran Member
Joined
21 Apr 2013
Messages
24,876
Location
Nottingham
Indeed and I would have thought the software would have been locked, certainly for safety critical functions, at its configuration version when the train was authorised by ORR and thus Siemens shouldn't up version it without re approval. I do appreciate that the train management systems do a load of other functionality these days and many reliability issues are stemming from software issues that need to be quickly fixed but surely safety critical software needs far more care.
Not sure either Siemens or ORR would be very happy with that, going by the number of versions these things are said to go through. This problem actually arises from a change to requirements not to the software itself. At the very least the change should have been subject to safety review before implementation, considering not just the possibility of damage to the electronics but also the operational consequences.
 

Facing Back

Member
Joined
21 May 2019
Messages
904
The following thoughts are not based on experience in the rail industry, but in software development in a variety of different industries and the public sector....

Software is rarely "locked" these days, but updates to live systems are typically managed through a rigorous change management/control process - which includes assessing the need (often a change in requirements from the users of the software) then testing the hell out of both the new features and making sure that they haven't broken any existing functionality - the latter can take much more effort than the former. Also the process to deploy the new software is heavily tested.

Mistakes do happen of course - I don't know whether the permanent train lock in the new version of the software was a requirement which was incorrectly designed, or the software not delivering the documented requirements. My experience in software development (30+ years now) is that poor requirements are now more often the problem - testing generally picks up when the software doesn't deliver to those requirements.

The fact that the trains are not all running on the same version of the software is not automatically a cause for concern. My experience is that it is fairly common. If the new software release was to correct a safety critical defect of course then you would expect it to be deployed across the fleet.
 

hwl

Established Member
Joined
5 Feb 2012
Messages
7,384
...that poor requirements are now more often the problem - testing generally picks up when the software doesn't deliver to those requirements....
Completely agree on this being a big issue.
 

edwin_m

Veteran Member
Joined
21 Apr 2013
Messages
24,876
Location
Nottingham
Mistakes do happen of course - I don't know whether the permanent train lock in the new version of the software was a requirement which was incorrectly designed, or the software not delivering the documented requirements. My experience in software development (30+ years now) is that poor requirements are now more often the problem - testing generally picks up when the software doesn't deliver to those requirements.
From this quote...
In order to prevent the situation arising where a driver is unaware of a safety-related fault or out-of-specification condition and carries out a battery reset with the potential to worsen the situation, Siemens decided to implement a “permanent lock-out” in software version 3.27.x that could not be cleared by a battery reset. Such a permanent lock-out would require a technician to attend the train and perform an analysis of the causes of the lock-out before clearing it. Siemens identified a range of trigger conditions for the permanent lock-out, intended to be those conditions that could be made worse by clearing a lock-out that had been imposed. Among the conditions selected was the detection of a low power supply frequency.
...it would appear to be a conscious decision by someone in Siemens to specify this functionality as a requirement, rather than a mistake in implementation of requirements. As I've probably said somewhere in this thread already, the software people then proceeded to implement the wrong thing extremely well.
 

Facing Back

Member
Joined
21 May 2019
Messages
904
That looks very plausible to me although I'll be slightly surprised if Siemens get to make that kind of functionality decision alone without agreement by the TOC/Rosco/DFT/NR.
 

w1bbl3

Member
Joined
6 Mar 2011
Messages
325
The permanent lock out introduced by the specification for version 3.27.x appears have been intended to cover situations where resetting without diagnosis could result in damage to the train which would then have been a problem from Siemens to fix. The lock out in event of transient low power supply frequency being implicit to the specification requirement, I'd suspect this occurring in transient situations was unintended. I'd expect that as maintenance is the responsibility of Siemens they would have been able to introduce such a software modification without needing direct approval from GTR/DFT/NR under guise of reliability improvements.
 

Facing Back

Member
Joined
21 May 2019
Messages
904
The permanent lock out introduced by the specification for version 3.27.x appears have been intended to cover situations where resetting without diagnosis could result in damage to the train which would then have been a problem from Siemens to fix. The lock out in event of transient low power supply frequency being implicit to the specification requirement, I'd suspect this occurring in transient situations was unintended. I'd expect that as maintenance is the responsibility of Siemens they would have been able to introduce such a software modification without needing direct approval from GTR/DFT/NR under guise of reliability improvements.
That sounds plausible too. Even for a bug which was Siemen's responsibility to fix I'd still expect a crowd to sign off on the design - they do in other regulated environments. However if the consequences were not anticipated or intended then it wouldn't have been in the design specs for anyone to worry about. Its disappointing that it wasn't picked up in regression testing (ie we haven't broken anything which currently works) but without knowing what is in the detail design/use cases I guess its hard to know for sure.
 

edwin_m

Veteran Member
Joined
21 Apr 2013
Messages
24,876
Location
Nottingham
Per my quote above it looks like a specific requirement was set to lock out permanently at low supply frequency, so the testing would have been to confirm that happened as "intended".
 

hwl

Established Member
Joined
5 Feb 2012
Messages
7,384
Per my quote above it looks like a specific requirement was set to lock out permanently at low supply frequency, so the testing would have been to confirm that happened as "intended".
Exactly failure to engage brains enough on the requirements at an early stage.
To many people thinking happy inside the box thoughts and not enough negative outside of the box thinking being done initially.

Thankfully software all properly sorted and rolled out ASAP.

Software is giving most train manufactures nasty existential thoughts currently...
 

duffield

Established Member
Joined
31 Jul 2013
Messages
1,342
Location
East Midlands
My opinion is that even if there are circumstances where it's appropriate to prevent the driver resetting the system off their own bat, there should always be the *option* of a remote reset, either directly (if it's possible for the technician to communicate remotely with the relevant train) or indirectly (by the technician supplying a 'reset code' for the driver to type in).
This is how my burglar alarm works if a fault develops - if the technician on the phone agrees the circumstances are appropriate, they'll give me a reset code; if not they will schedule a visit.

In this particular case, the technicians might well have needed to visit a few trains physically to diagnose the issue but as soon as they realised it was fine for the drivers to reset the system without train damage, all the other drivers could have been given reset codes; this would have considerably reduced the length of the disruption.
 

Nicholas Lewis

Established Member
Joined
9 Aug 2019
Messages
6,093
Location
Surrey
My opinion is that even if there are circumstances where it's appropriate to prevent the driver resetting the system off their own bat, there should always be the *option* of a remote reset, either directly (if it's possible for the technician to communicate remotely with the relevant train) or indirectly (by the technician supplying a 'reset code' for the driver to type in).
This is how my burglar alarm works if a fault develops - if the technician on the phone agrees the circumstances are appropriate, they'll give me a reset code; if not they will schedule a visit.

In this particular case, the technicians might well have needed to visit a few trains physically to diagnose the issue but as soon as they realised it was fine for the drivers to reset the system without train damage, all the other drivers could have been given reset codes; this would have considerably reduced the length of the disruption.
Good point and this is the sort of issue i would have thought ORR would have pontificated on. Im pretty sure if these had been 345's marooned in Crossrail Tunnels they wouldnt have been so quiet
 
Status
Not open for further replies.

Top