CHAS IT Error – A Project Manager’s Worst Nightmare

More than 7000 Singapore citizens who applied and renewed their Community Health Assist Scheme (CHAS) between September 2018 and October 2018 received inaccurate healthcare subsidies due to IT-related error.

Ministry of Health (MOH) said in a statement that the software application, managed and administered by NCS Pte Ltd, had miscalculated the means-test results. NCS, a company owned by Singtel, is a multinational information technology and communications engineering company headquartered in Singapore. Most, if not all, of Singapore government software development, software and hardware maintenance, software administration and facility management is done by NCS.

NCS and MOH had jointly given a timeline of the events that preceded before the discovery of the error. You can read the entire article from CNA website here.

Based on my own experience of being a project manager, managing a number of software development and maintenance projects in Singapore, I would say that such an incident is a serious event for a company and its project team; a nightmare scenario for the project manager. If things can go wrong they will go wrong. That’s why there are measures in place to ensure that such an incident doesn’t happen. I believe there were serious lapses from the managerial team to ensure the integrity of the system after deployment. Especially when the system was performing as expected prior to migration and the problem appeared only after the migration to different servers. The mere fact that a wrong version of a module (or file) was installed shows that there was very little due diligence done by the software deployment team and almost no due diligence was done by the project management team.

Such an issue could have been avoided or flagged off and resolved by doing a systematic migration activity. Typically a systematic migration activity will include the following:

Development Environment

Typically for a migration project, unless newer features are implemented, there is almost no involvement of a development environment. In this case, since the system was already in production, so a version must have been fixed and made known to all stakeholders. Deployment scripts would have been in place that would automatically flag out issues if any wrong version of a package is installed.

Provisioning of Testing and Production Environment

Before any migration, a new Test Environment that is standardized and in close alignment with the target production environment must have been made available by the Authority. The vendor’s maintenance team should have verified all the details and the provisioning of the new test environment before they start doing any deployment activity. Ideally, the systems team should go through a checklist of items and get sign off to ensure that the provisioning is correct. Provisioning includes at least checking port connectivity, server details and version numbers of the server components among many items. The same procedure applies to the production environment as well.

Application Migration from Old to New Test Environment

Once the testing environment is verified and signed off, the application team will start the migration activity. This is primarily broken down into two broad sub-activities. The first is the database migration and the second is application migration. An authorised DBA (Database Administrator) will create a new database in the newly provisioned test environment. After that, the DBA will migrate the database from the old environment to the new environment. The DBA will only close the activity after she/he had made a check against all the prerequisites for successful database migration. The migration logs should then be checked by the Authority appointed DBA to endorse and sign off the database migration activity.  The second sub-activity is to deploy the test server packages. This will include the web application, the batch jobs, the mail jobs and any other application specific packages. The deployment team should create deployment forms, create the necessary checklists with all the correct versions for each package and then start the deployment activity. For a migration project, it is essential to just use the old test environment package and deploy to the new environment. However, there can be exceptions to this rule which I am not going to discuss here. Once successfully deployed Authority should be informed and requested to do a user acceptance test (UAT) on the new testing environment. If successful an Authority appointed project manager should sign off the UAT form.

Application Migration from Old to New Production Environment

Once, UAT is successful the maintenance team should start preparing for a production migration. All deployment files, packages and scripts should now be taken from the test environment migration. There cannot be any changes made on the deployment packages between the test environment and the production environment. A migration date and time should be fixed. Authority should display maintenance and downtown message to the members of the public and then cut off the old production server on the stated date and time. The maintenance team now should follow the steps and do the production database migration, verification followed by deployment of application services. Once deployed Authority appointed user should log in to the system with a production account and then do a sanity check. If the migration fails then based on a business continuity plan and/or contingency plan Authority can decide to fall back to the old production server. If the deployment is a success then the Authority will open the services to the members of the public. However, since this is a new environment, the Authority appointed user and the vendor’s maintenance team should be on standby to help resolve any critical issues immediately.

In the CHAS case although the problem was flagged at a later date yet was not resolved completely is gross negligence on the part of the application maintenance team. However, this is what is likely to happen when projects run at a tight budget and when resources are almost always multitasking with several projects at hand. People lose track due to context switches from different projects and then one mistake can snowball to a failure of this scale. Blaming a low-level employee who actually did the deployment is not going to solve the larger problem.

Leave a Reply

Your email address will not be published. Required fields are marked *