train onboard software

Ken H · 7 Oct 2019

Was reading an article in Oct Modern Railways about how the railway coped with the power supply problems earlier this year.

one of the problems was the power supply frequency dropped, and the on train protection tripped on some trains

The power supply frequency quickly recovered as National grid bought generating capacity online and switched out consumers. But the railway kept their supplies.

The procedure after such a trip is to do a battery reset of the train, and the train should be good to go again

It seems that the software on some trains was changed by a new release, so 2 versions were in service.

The older version had a wider tolerance of frequency so did not trip

The newer version had a narrower tolerance, so did trip. But the software had also been changed to disallow a battery reset after a supply frequency trip

They could not reload the old version on the new trains because of a reliability fix to the CCTV system

So some questions

1. Was the software not subject to User Acceptance Testing by the ROSCO or the TOC? Did no-one read the release documentation and think 'Hmm, thats quite a big change, I will escalate that'. Was there release documentation?

2. Surely the software should be divided into applications. Upgrading one application should not affect the others. So the CCTV app should be up-gradeable without affecting the power protection stuff.

3. Are features like the tolerance levels of the power supply frequency not 'soft coded', i.e. kept in a parameter file and not 'hard coded' in the programs.

RailUK Forums

ComUtoR · 7 Oct 2019

I haven't read the article but are you sure that the sequence of events is correct ?

Ken H said:
It seems that the software on some trains was changed by a new release, so 2 versions were in service.

The older version had a wider tolerance of frequency so did not trip

The newer version had a narrower tolerance, so did trip. But the software had also been changed to disallow a battery reset after a supply frequency trip

They could not reload the old version on the new trains because of a reliability fix to the CCTV system

There are have always been multiple versions of the train software in service. As I understand it, the units tripped and then got rebooted. Because there some didn't come back in, they had to upload a software fix to get them to reboot. Is the article stating that the software uploaded before the incident also had issues ?

Did they report the other issue that pretty much caused the problem ?

edwin_m · 7 Oct 2019

The article does say that the newer software release was the problem, and that somebody had intentionally taken away the ability of the driver to recover, "intended to protect some electronic components in the traction package". At the time of the article a patch was under test to restore this ability.

Sounds very much like the sort of unintended consequence that comes from software changes. Discussion on another thread suggests a frequency deviation of this magnitude is pretty much unprecedented, so perhaps whoever it was just thought it wouldn't happen.

Ken H · 7 Oct 2019

ComUtoR said:
I haven't read the article but are you sure that the sequence of events is correct ?

There are have always been multiple versions of the train software in service. As I understand it, the units tripped and then got rebooted. Because there some didn't come back in, they had to upload a software fix to get them to reboot. Is the article stating that the software uploaded before the incident also had issues ?

Did they report the other issue that pretty much caused the problem ?

the article stated the fleet was running with 2 versions of the software. The trains with the old version didnt trip, the new version did (because of different tolerances for line frequency). The new version did not allow the driver to do a battery reset. The article implies this was a feature of the new version. That meant each failed train had to be visited by a technician with a laptop to reboot the train.

Ken H · 7 Oct 2019

edwin_m said:
The article does say that the newer software release was the problem, and that somebody had intentionally taken away the ability of the driver to recover, "intended to protect some electronic components in the traction package". At the time of the article a patch was under test to restore this ability.

Sounds very much like the sort of unintended consequence that comes from software changes. Discussion on another thread suggests a frequency deviation of this magnitude is pretty much unprecedented, so perhaps whoever it was just thought it wouldn't happen.

The article quotes network rail standards and conflicting euro-norm standards. It was argued the trains with the new software didnt conform to Network Rail standards.

ComUtoR · 7 Oct 2019

Ken H said:
The new version did not allow the driver to do a battery reset.

I find this quite odd as rebooting is pretty standard and is a button press in the cab. I'm not sure what the article is suggesting.

The trains with the old version didn't trip, the new version did

This is interesting. Because although the old version tripped, they still didn't reboot.

That meant each failed train had to be visited by a technician with a laptop to reboot the train.

Some of the units that did trip were still able to reboot. The Drivers did do a battery reset and the unit rebooted correctly. I think it was more than just new version/old version.

There are at least 3 versions currently running about.

hwl · 7 Oct 2019

Ken H said:
the article stated the fleet was running with 2 versions of the software. The trains with the old version didnt trip, the new version did (because of different tolerances for line frequency). The new version did not allow the driver to do a battery reset. The article implies this was a feature of the new version. That meant each failed train had to be visited by a technician with a laptop to reboot the train.

The article is wrong...

Several software and specification screw ups:

EN50163 permits traction electronics to start shuttling down below 49Hz (a good idea) with shut off for everything (e.g. auxiliaries) at 48.5Hz however they programmed in 49Hz as the complete shut off value by mistake. The auxiliary power supplies should never have shut down The second issues was resetting (or not after) the shut down.

At the time of the "700" incident there were at least 5 software variants in service on the 700s. In the latest 3 software versions pre incident (3.27/28/29 - circa 60% of the fleet) they managed to remove the ability to battery disconnect reset and didn't regression test, all the problem sit down units had the later software (3.27+). Units with 3.25 and 3.26 were able to battery disconnect reset and get moving.
In version 3.30 (roll out started the night of the incident) and later battery disconnect reset was restored.

The newest software versions will have autoreset when the frequency returns to above 49.5Hz as well as setting the complete shutdown frequency to 48.5Hz instead of 49Hz.

Ken H · 7 Oct 2019

hwl said:
The article is wrong...

Several software and specification screw ups:

EN50163 permits traction electronics to start shuttling down below 49Hz (a good idea) with shut off for everything (e.g. auxiliaries) at 48.5Hz however they programmed in 49Hz as the complete shut off value by mistake. The auxiliary power supplies should never have shut down The second issues was resetting (or not after) the shut down.

At the time of the "700" incident there were at least 5 software variants in service on the 700s. In the latest 3 software versions pre incident (3.27/28/29 - circa 60% of the fleet) they managed to remove the ability to battery disconnect reset and didn't regression test, all the problem sit down units had the later software (3.27+). Units with 3.25 and 3.26 were able to battery disconnect reset and get moving.
In version 3.30 (roll out started the night of the incident) and later battery disconnect reset was restored.

The newest software versions will have autoreset when the frequency returns to above 49.5Hz as well as setting the complete shutdown frequency to 48.5Hz instead of 49Hz.

so how did 3.27 manage to get into production without proper version control and client sign-off, after UAT?

DarloRich · 7 Oct 2019

Ken H said:
so how did 3.27 manage to get into production without proper version control and client sign-off, after UAT?

It was probably just a cock up rather than a conspiracy i suspect you would prefer!

dosxuk · 7 Oct 2019

Ken H said:
2. Surely the software should be divided into applications. Upgrading one application should not affect the others. So the CCTV app should be up-gradeable without affecting the power protection stuff.

A reliability fix for the CCTV could mean many things, including (off the top of my head, I've got no idea what they actually did) :-
- Making the display of timestamps more accurate
- Changing the way data is sent along the train
- Altering the power switching to reduce glitches when the train switches between AC & DC

That last idea though I could well see affecting other parts of the trains power systems - it's all very well saying things should be updated separately, but when systems are interconnected there will be updates that affect more than the 'headline' system in an update.

Ken H · 7 Oct 2019

DarloRich said:
It was probably just a cock up rather than a conspiracy i suspect you would prefer!

if I put in software in production that severely impacted my clients business, I would find my contract ended and find myself being sued for damages.
Which is why we have UAT signoff. Then its the manager who signed it off's fault.
But one would expect said manager to be told of any material changes. Stuff like disabling battery reset and frequency tolerances.

But what is the point of type testing if the manufacturer can change the characteristics of the train? All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?

jon0844 · 7 Oct 2019

The next big update will be to the PIS, fixing the audio/stuttering issue. This may mean we can expect a return of the full-screen graphical images and speeches about engineering works, safety etc.

Some will like this, some will hate it!

ComUtoR · 7 Oct 2019

Ken H said:
How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?

They fixed the braking issue a few versions back....

theageofthetra · 7 Oct 2019

Ken H said:
if I put in software in production that severely impacted my clients business, I would find my contract ended and find myself being sued for damages.
Which is why we have UAT signoff. Then its the manager who signed it off's fault.
But one would expect said manager to be told of any material changes. Stuff like disabling battery reset and frequency tolerances.

But what is the point of type testing if the manufacturer can change the characteristics of the train? All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?

Spot on.

theageofthetra · 7 Oct 2019

jon0844 said:
The next big update will be to the PIS, fixing the audio/stuttering issue. This may mean we can expect a return of the full-screen graphical images and speeches about engineering works, safety etc.

Some will like this, some will hate it!

I imagine this will be to ensure disability compliance?

DarloRich · 7 Oct 2019

Ken H said:
if I put in software in production that severely impacted my clients business, I would find my contract ended and find myself being sued for damages.
Which is why we have UAT signoff. Then its the manager who signed it off's fault.
But one would expect said manager to be told of any material changes. Stuff like disabling battery reset and frequency tolerances.

But what is the point of type testing if the manufacturer can change the characteristics of the train? All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?

I know how IT projects work, thanks. The problem is that, sometimes, $hit happens and communications, understanding and sign off fail:

There was an important job to be done and Everybody was sure that Somebody would do it. Anybody could have done it, but Nobody did it. Somebody got angry about that, because it was Everybody’s job. Everybody thought Anybody could do it, but Nobody realized that Everybody wouldn’t do it. It ended up that Everybody blamed Somebody when Nobody did what Anybody could have.

The important thing is that the process fault is identified and fixed so it doesn't happen again. The lawyers can sort the rest out.

edwin_m · 7 Oct 2019

Ken H said:
so how did 3.27 manage to get into production without proper version control and client sign-off, after UAT?

It sounds to me like an issue with the requirements not the software. Somebody changed the requirements relating to frequency-related shutdowns and resets, without realizing this put them arguably in breach of a standard. The requirement for the driver to be able to reset after frequency deviation was deleted, either unintentionally or because someone had considered the scenarios when it would be needed and decided they weren't likely enough to worry about. Once that happens the version control and sign-offs just ensure that it is doing the wrong thing very well.

rebmcr · 7 Oct 2019

Ken H said:
All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?

I get your point, but this is not really a materially different situation to 1980s stock having a non-standard design of object deflector installed through routine maintenance, which later causes an incident. (I seem to remember that this actually happened in the north west, causing a great many Sprinters to be fixed overnight).

coppercapped · 7 Oct 2019

Ken H said:
if I put in software in production that severely impacted my clients business, I would find my contract ended and find myself being sued for damages.
Which is why we have UAT signoff. Then its the manager who signed it off's fault.
But one would expect said manager to be told of any material changes. Stuff like disabling battery reset and frequency tolerances.

But what is the point of type testing if the manufacturer can change the characteristics of the train? All the tests done in acceptance testing of the hardware are invalidated by software changes, now that software are a core component, not a bolt on goody. How do we know a (hypothetical) bug hasnt been installed that affects safety, like braking?

The contractual issue here is that the manufacturer's client is Cross London Trains, which in turn has a contract with the Department for Transport to supply trains to the franchisee, in this case GTR.

Cross London Trains is a subsidiary of Siemens, the manufacturer.

Who sues whom for damages in this case? :rolleyes:

PG · 7 Oct 2019

coppercapped said:
The contractual issue here is that the manufacturer's client is Cross London Trains, which in turn has a contract with the Department for Transport to supply trains to the franchisee, in this case GTR.

Cross London Trains is a subsidiary of Siemens, the manufacturer.

Who sues whom for damages in this case?

I'm sure each parties lawyers will manage to work out who to claim against whilst lining their own pockets :smile:

edwin_m said:
It sounds to me like an issue with the requirements not the software. Somebody changed the requirements relating to frequency-related shutdowns and resets, without realizing this put them arguably in breach of a standard. The requirement for the driver to be able to reset after frequency deviation was deleted, either unintentionally or because someone had considered the scenarios when it would be needed and decided they weren't likely enough to worry about. Once that happens the version control and sign-offs just ensure that it is doing the wrong thing very well.

Another case of GIGO = Garbage In Garbage Out. If the specification against which something is being tested isn't right then neither will the end product.

Ken H · 7 Oct 2019

PG said:
I'm sure each parties lawyers will manage to work out who to claim against whilst lining their own pockets

Another case of GIGO = Garbage In Garbage Out. If the specification against which something is being tested isn't right then neither will the end product.

kkong · 7 Oct 2019

Ahem.

train onboard software

On Moderation

RailUK Forums

Established Member

Veteran Member

On Moderation

On Moderation

Established Member

Established Member

On Moderation

Veteran Member

Established Member

On Moderation

Veteran Member

Established Member

On Moderation

On Moderation

Veteran Member

Veteran Member

Established Member

Established Member

Established Member

On Moderation

Member