• Our booking engine at tickets.railforums.co.uk (powered by TrainSplit) helps support the running of the forum with every ticket purchase! Find out more and ask any questions/give us feedback in this thread!

train onboard software

Status
Not open for further replies.

Ken H

On Moderation
Joined
11 Nov 2018
Messages
6,288
Location
N Yorks
Was reading an article in Oct Modern Railways about how the railway coped with the power supply problems earlier this year.

one of the problems was the power supply frequency dropped, and the on train protection tripped on some trains

The power supply frequency quickly recovered as National grid bought generating capacity online and switched out consumers. But the railway kept their supplies.

The procedure after such a trip is to do a battery reset of the train, and the train should be good to go again

It seems that the software on some trains was changed by a new release, so 2 versions were in service.

The older version had a wider tolerance of frequency so did not trip

The newer version had a narrower tolerance, so did trip. But the software had also been changed to disallow a battery reset after a supply frequency trip

They could not reload the old version on the new trains because of a reliability fix to the CCTV system

So some questions

1. Was the software not subject to User Acceptance Testing by the ROSCO or the TOC? Did no-one read the release documentation and think 'Hmm, thats quite a big change, I will escalate that'. Was there release documentation?

2. Surely the software should be divided into applications. Upgrading one application should not affect the others. So the CCTV app should be up-gradeable without affecting the power protection stuff.

3. Are features like the tolerance levels of the power supply frequency not 'soft coded', i.e. kept in a parameter file and not 'hard coded' in the programs.
 
Sponsor Post - registered members do not see these adverts; click here to register, or click here to log in
R

RailUK Forums

ComUtoR

Established Member
Joined
13 Dec 2013
Messages
9,431
Location
UK
I haven't read the article but are you sure that the sequence of events is correct ?

It seems that the software on some trains was changed by a new release, so 2 versions were in service.

The older version had a wider tolerance of frequency so did not trip

The newer version had a narrower tolerance, so did trip. But the software had also been changed to disallow a battery reset after a supply frequency trip

They could not reload the old version on the new trains because of a reliability fix to the CCTV system

There are have always been multiple versions of the train software in service. As I understand it, the units tripped and then got rebooted. Because there some didn't come back in, they had to upload a software fix to get them to reboot. Is the article stating that the software uploaded before the incident also had issues ?

Did they report the other issue that pretty much caused the problem ?
 

edwin_m

Veteran Member
Joined
21 Apr 2013
Messages
24,880
Location
Nottingham
The article does say that the newer software release was the problem, and that somebody had intentionally taken away the ability of the driver to recover, "intended to protect some electronic components in the traction package". At the time of the article a patch was under test to restore this ability.

Sounds very much like the sort of unintended consequence that comes from software changes. Discussion on another thread suggests a frequency deviation of this magnitude is pretty much unprecedented, so perhaps whoever it was just thought it wouldn't happen.
 

Ken H

On Moderation
Joined
11 Nov 2018
Messages
6,288
Location
N Yorks
I haven't read the article but are you sure that the sequence of events is correct ?



There are have always been multiple versions of the train software in service. As I understand it, the units tripped and then got rebooted. Because there some didn't come back in, they had to upload a software fix to get them to reboot. Is the article stating that the software uploaded before the incident also had issues ?

Did they report the other issue that pretty much caused the problem ?
the article stated the fleet was running with 2 versions of the software. The trains with the old version didnt trip, the new version did (because of different tolerances for line frequency). The new version did not allow the driver to do a battery reset. The article implies this was a feature of the new version. That meant each failed train had to be visited by a technician with a laptop to reboot the train.
 

Ken H

On Moderation
Joined
11 Nov 2018
Messages
6,288
Location
N Yorks
The article does say that the newer software release was the problem, and that somebody had intentionally taken away the ability of the driver to recover, "intended to protect some electronic components in the traction package". At the time of the article a patch was under test to restore this ability.

Sounds very much like the sort of unintended consequence that comes from software changes. Discussion on another thread suggests a frequency deviation of this magnitude is pretty much unprecedented, so perhaps whoever it was just thought it wouldn't happen.
The article quotes network rail standards and conflicting euro-norm standards. It was argued the trains with the new software didnt conform to Network Rail standards.
 

ComUtoR

Established Member
Joined
13 Dec 2013
Messages
9,431
Location
UK
The new version did not allow the driver to do a battery reset.

I find this quite odd as rebooting is pretty standard and is a button press in the cab. I'm not sure what the article is suggesting.

The trains with the old version didn't trip, the new version did

This is interesting. Because although the old version tripped, they still didn't reboot.

That meant each failed train had to be visited by a technician with a laptop to reboot the train.

Some of the units that did trip were still able to reboot. The Drivers did do a battery reset and the unit rebooted correctly. I think it was more than just new version/old version.

There are at least 3 versions currently running about.
 

hwl

Established Member
Joined
5 Feb 2012
Messages
7,389
the article stated the fleet was running with 2 versions of the software. The trains with the old version didnt trip, the new version did (because of different tolerances for line frequency). The new version did not allow the driver to do a battery reset. The article implies this was a feature of the new version. That meant each failed train had to be visited by a technician with a laptop to reboot the train.
The article is wrong...

Several software and specification screw ups:

EN50163 permits traction electronics to start shuttling down below 49Hz (a good idea) with shut off for everything (e.g. auxiliaries) at 48.5Hz however they programmed in 49Hz as the complete shut off value by mistake. The auxiliary power supplies should never have shut down The second issues was resetting (or not after) the shut down.

At the time of the "700" incident there were at least 5 software variants in service on the 700s. In the latest 3 software versions pre incident (3.27/28/29 - circa 60% of the fleet) they managed to remove the ability to battery disconnect reset and didn't regression test, all the problem sit down units had the later software (3.27+). Units with 3.25 and 3.26 were able to battery disconnect reset and get moving.
In version 3.30 (roll out started the night of the incident) and later battery disconnect reset was restored.


The newest software versions will have autoreset when the frequency returns to above 49.5Hz as well as setting the complete shutdown frequency to 48.5Hz instead of 49Hz.
 

Ken H

On Moderation
Joined
11 Nov 2018
Messages
6,288
Location
N Yorks
The article is wrong...

Several software and specification screw ups:

EN50163 permits traction electronics to start shuttling down below 49Hz (a good idea) with shut off for everything (e.g. auxiliaries) at 48.5Hz however they programmed in 49Hz as the complete shut off value by mistake. The auxiliary power supplies should never have shut down The second issues was resetting (or not after) the shut down.

At the time of the "700" incident there were at least 5 software variants in service on the 700s. In the latest 3 software versions pre incident (3.27/28/29 - circa 60% of the fleet) they managed to remove the ability to battery disconnect reset and didn't regression test, all the problem sit down units had the later software (3.27+). Units with 3.25 and 3.26 were able to battery disconnect reset and get moving.
In version 3.30 (roll out started the night of the incident) and later battery disconnect reset was restored.


The newest software versions will have autoreset when the frequency returns to above 49.5Hz as well as setting the complete shutdown frequency to 48.5Hz instead of 49Hz.

so how did 3.27 manage to get into production without proper version control and client sign-off, after UAT?
 

dosxuk

Established Member
Joined
2 Jan 2011
Messages
1,760
2. Surely the software should be divided into applications. Upgrading one application should not affect the others. So the CCTV app should be up-gradeable without affecting the power protection stuff.

A reliability fix for the CCTV could mean many things, including (off the top of my head, I've got no idea what they actually did) :-
- Making the display of timestamps more accurate
- Changing the way data is sent along the train
- Altering the power switching to reduce glitches when the train switches between AC & DC

That last idea though I could well see affecting other parts of the trains power systems - it's all very well saying things should be updated separately, but when systems are interconnected there will be updates that affect more than the 'headline' system in an update.
 

Ken H

On Moderation
Joined
11 Nov 2018
Messages
6,288
Location
N Yorks
It was probably just a cock up rather than a conspiracy i suspect you would prefer!
if I put in software in production that severely impacted my clients business, I would find my contract ended and find myself being sued for damages.
Which is why we have UAT signoff. Then its the manager who signed it off's fault.
But one would expect said manager to be told of any material changes. Stuff like disabling battery reset and frequency tolerances.

But what is the point of type testing if the manufacturer can change the characteristics of the train? All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?
 

jon0844

Veteran Member
Joined
1 Feb 2009
Messages
28,046
Location
UK
The next big update will be to the PIS, fixing the audio/stuttering issue. This may mean we can expect a return of the full-screen graphical images and speeches about engineering works, safety etc.

Some will like this, some will hate it!
 

theageofthetra

On Moderation
Joined
27 May 2012
Messages
3,504
if I put in software in production that severely impacted my clients business, I would find my contract ended and find myself being sued for damages.
Which is why we have UAT signoff. Then its the manager who signed it off's fault.
But one would expect said manager to be told of any material changes. Stuff like disabling battery reset and frequency tolerances.

But what is the point of type testing if the manufacturer can change the characteristics of the train? All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?
Spot on.
 

theageofthetra

On Moderation
Joined
27 May 2012
Messages
3,504
The next big update will be to the PIS, fixing the audio/stuttering issue. This may mean we can expect a return of the full-screen graphical images and speeches about engineering works, safety etc.

Some will like this, some will hate it!
I imagine this will be to ensure disability compliance?
 

DarloRich

Veteran Member
Joined
12 Oct 2010
Messages
29,276
Location
Fenny Stratford
if I put in software in production that severely impacted my clients business, I would find my contract ended and find myself being sued for damages.
Which is why we have UAT signoff. Then its the manager who signed it off's fault.
But one would expect said manager to be told of any material changes. Stuff like disabling battery reset and frequency tolerances.

But what is the point of type testing if the manufacturer can change the characteristics of the train? All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?

I know how IT projects work, thanks. The problem is that, sometimes, $hit happens and communications, understanding and sign off fail:

There was an important job to be done and Everybody was sure that Somebody would do it. Anybody could have done it, but Nobody did it. Somebody got angry about that, because it was Everybody’s job. Everybody thought Anybody could do it, but Nobody realized that Everybody wouldn’t do it. It ended up that Everybody blamed Somebody when Nobody did what Anybody could have.

The important thing is that the process fault is identified and fixed so it doesn't happen again. The lawyers can sort the rest out.
 

edwin_m

Veteran Member
Joined
21 Apr 2013
Messages
24,880
Location
Nottingham
so how did 3.27 manage to get into production without proper version control and client sign-off, after UAT?
It sounds to me like an issue with the requirements not the software. Somebody changed the requirements relating to frequency-related shutdowns and resets, without realizing this put them arguably in breach of a standard. The requirement for the driver to be able to reset after frequency deviation was deleted, either unintentionally or because someone had considered the scenarios when it would be needed and decided they weren't likely enough to worry about. Once that happens the version control and sign-offs just ensure that it is doing the wrong thing very well.
 

rebmcr

Established Member
Joined
15 Nov 2011
Messages
3,849
Location
St Neots
All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?

I get your point, but this is not really a materially different situation to 1980s stock having a non-standard design of object deflector installed through routine maintenance, which later causes an incident. (I seem to remember that this actually happened in the north west, causing a great many Sprinters to be fixed overnight).
 

coppercapped

Established Member
Joined
13 Sep 2015
Messages
3,098
Location
Reading
if I put in software in production that severely impacted my clients business, I would find my contract ended and find myself being sued for damages.
Which is why we have UAT signoff. Then its the manager who signed it off's fault.
But one would expect said manager to be told of any material changes. Stuff like disabling battery reset and frequency tolerances.

But what is the point of type testing if the manufacturer can change the characteristics of the train? All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?
The contractual issue here is that the manufacturer's client is Cross London Trains, which in turn has a contract with the Department for Transport to supply trains to the franchisee, in this case GTR.

Cross London Trains is a subsidiary of Siemens, the manufacturer.

Who sues whom for damages in this case? :rolleyes:
 

PG

Established Member
Joined
12 Oct 2010
Messages
2,842
Location
at the end of the high and low roads
The contractual issue here is that the manufacturer's client is Cross London Trains, which in turn has a contract with the Department for Transport to supply trains to the franchisee, in this case GTR.

Cross London Trains is a subsidiary of Siemens, the manufacturer.

Who sues whom for damages in this case? :rolleyes:
I'm sure each parties lawyers will manage to work out who to claim against whilst lining their own pockets :smile:
It sounds to me like an issue with the requirements not the software. Somebody changed the requirements relating to frequency-related shutdowns and resets, without realizing this put them arguably in breach of a standard. The requirement for the driver to be able to reset after frequency deviation was deleted, either unintentionally or because someone had considered the scenarios when it would be needed and decided they weren't likely enough to worry about. Once that happens the version control and sign-offs just ensure that it is doing the wrong thing very well.
Another case of GIGO = Garbage In Garbage Out. If the specification against which something is being tested isn't right then neither will the end product.
 

Ken H

On Moderation
Joined
11 Nov 2018
Messages
6,288
Location
N Yorks
I'm sure each parties lawyers will manage to work out who to claim against whilst lining their own pockets :smile:

Another case of GIGO = Garbage In Garbage Out. If the specification against which something is being tested isn't right then neither will the end product.
iu
 
Status
Not open for further replies.

Top