I've been following this thread (and the whole forum) for years as a lurker. I've been a software engineer for over 25 years, working on realtime embedded firmware for telecoms systems that are part of the critical national infrastructure and the current issues with the 777s have made me want to join the discussion.
I looked up Teleste's PIS system to see how it is put together (see
this brochure, particularly the system diagram on page 7). It seems the location system on the train doesn't update the displays or cue the announcements directly - the location data goes back to a central server which then sends the display updates and announcement cues to the train. This relies on the vehicles being in constant contact with the server over a network, which is presumably why they had to install the new trackside wireless network for the introduction of the 777s. Merseytravel have named their implementation of the Teleste system Train Connectivity and Information System (TCIS)
As well as the location data and PIS updates, the train's wireless connection to the network (which is only 100Mbps, according to the
June 2020 RailEngineer article) also allows the trains to stream maintenance and CCTV data to Sandhills, and the on-board passenger WiFi is supplied as a byproduct of the requirement for the trains to be constantly connected.
It makes me suspect that the onboard PIS issues are either due to the connection to the server in Sandhills being dropped and unable to reconnect, or possibly that onboard router that links all the train systems via ethernet to the trackside network gets overloaded and crashes. As to the possible causes, it might be that there are now more nodes (trains) active on the network, passengers using the WiFi (though hopefully it has been configured with a protected bandwidth limit for the essential train systems), or a change to one of the other on-board systems that uses the location data causing problems for the PIS (wasn't there an issue early on relating to the real time data recorders getting the wrong time and date in their log messages?).
The telecoms systems I work on were designed to be resilient via the use of distributed processing from the start - if a lower node loses connectivity with an upper node, it has a built-in fallback mode to provide limited local functionality rather than just giving up. However, this is seen as old-fashioned nowadays, and systems like Teleste's are now the norm, where the user-facing nodes are 'dumb' or 'light' to make them cheaper, and all the processing is offloaded to central servers. That is fine, provided you can guarantee a connection to the server 100% of the time.
Designing a PIS that (if Teleste's system diagram is correct) does not have a failback mode to use locally-sourced location data to provide it's updates in the event of a wireless network outage seems a strange choice, especially for an environment with as much potential RF noise as a third rail supplied EMU. I'm reminded of Harvard Technology, the company that supplied councils with 'intelligent streetlights' that were turned on and off by a server owned by the company. When they went bust, the server was turned off and
the lights just stayed on