Consultant's Corner: Migrating Mainframe Source Code to Git
In this article, ProData Consultant and Software Developer Kamil Kościesza invites us under the hood of his current project and takes us through the process of migrating mainframe source code to Git.
Welcome to the Consultant’s Corner series, a blog for independent IT freelancers. Here you can find out what fellow high-end IT consultants are up to in their current or recent projects. Read about trending technologies, technological solutions and get inspiration from the freelance journey of other like-minded IT professionals.
Old, but not out
Changing out one Version Control System (VCS) with another is usually pretty straightforward, as it only involves changing the way that files and the history are stored. But is it just as easy in the context of an older mainframe environment?
The mainframe installation we are working on is almost 50 years old, and only the hardware has seen upgrades over the years. But the old mainframe and IBM z/OS, still have a major benefit, and that is backward compatibility, meaning that the tools developed decades ago are still in use today. The current VCS (Librarian) is used since 1976, and all the tools and processes are, therefore, built around this particular VCS.
All these tools have to be updated to use Git instead of Librarian as the data source, and this is a change we want to make as seamless as possible for our developers. In other words, we are not going to change how the source code is stored and the development process at the same time. The tooling aspects are not part of this article's scope.
Monorepo: No thanks
Disregarding the tooling; the source code base also has a significant size. We have identified over 2000 repositories in Librarian. The main repository containing the banking application has a total of 100,000 different files and 1,200,000 historical versions, consisting of 80 GBs of source code that has to stay because of the regulatory aspects and continue to be accessible in Git in a way that will not limit the developers’ productivity.
We had tested out the monorepo strategy to host our banking application repository, but it turned out that this strategy is not performing well with a repository of this size. And even though status, commit or push commands were performing well, we have experienced issues with the log command. Retrieving the full history of a file would take around 60 seconds, which is not feasible. We cannot ask our developers to wait that long whenever they want to check the previous version of a file. Therefore, we have decided to create repositories based on business function instead of keeping all the application in one big repository, and this will significantly increase the performance as the size of the repository will be considerably smaller.
Dividing the migration process into four stages:
- Extract the data from our source repository into raw flat files.
- Preprocess the data and prepare the needed data to convert the code and create the target repository structure.
- Convert the code from the mainframe encoding (EBCDIC) into the Unix encoding. (UTF-8)
- Create the repository and insert the converted source code.
Let's dive deeper into the part of the project that I am responsible for, so stages 2, 3 and 4. These three stages are performed with two Java applications.
Stage 2: Preprocessing
The preprocessing stage is fairly simple but important for further actions. At this point we are identifying the file type and the target repository for all the files that are migrated. The outcome of the preprocessing will allow for manual modification, which is a requirement as the Git repository structure will be different than the Librarian structure, and some of the files might be placed in a different repository than the general algorithm says.
Stage 3 & 4: Converting the code and creating the repository
Converting the code from EBCDIC to UTF-8 and inserting the code to Git happens almost at the same time. Changing the encoding should not be a big issue. Usually, it is only a question of reading bytes in one encoding and then saving them in another one, again the long-lasting history of the source code has caused us another challenge. Before the mid-1980s, if a developer wanted to use some text control characters, he had to key in hexadecimal values. Those values are of course, invisible to the human eye, but they change how the text is rendered, after changing the encoding to UTF-8 it would also change the way how the source code was displayed.
To solve that issue we had to parse the code, replace the non-displayable control characters with the specific notation introduced by IBM with COBOL-85 standard while keeping in mind that the length of code was changed by this operation and that Cobol is limited to 72 significant characters in a line of code. Furthermore, we had to reorganize the code so that it fits together. After converting the source code, a commit is generated with all the original Metadata from Librarian and inserted to the Git repository.
Migration phases
Since we need to be 100% sure that everything works fine, and that we do not break the development flow, the whole migration process takes place over multiple phases.
In the first phase, we are going to create the target Git repositories with all the historical versions, but it will not be updated automatically with new versions from Librarian repositories.
After that, we will be running weekly delta migrations to keep the target repositories synchronized with the source repositories.
In the following phase, we are going to have Git repositories updated in a synchronous way when the Librarian repositories are updated, but still the main repository will be held in Librarian, and the one in Git will be considered as a backup. We will be running in this setup for a while before we can fully confirm that everything is working smoothly and as expected. When we reach that point, then we will be ready to switch and run Git as the main repository, while still keeping Librarian as a synchronized backup. Again this configuration will run for a while before we can decide that we are ready to phase Librarian out completely; this concludes the migration from Librarian to Git.
What are the benefits of this migration procedure?
This entire procedure opens up the mainframe architecture for use with modern software development tools. For instance, it will be possible to review code with market standard tools or to move the build process to Jenkins, which would then easily allow automated testing and static code analysis in the build process - all with quality gates and very clear reporting. Last but not least, there is a significant saving in licensing costs.
Who
ProData Consult & Software Developer Kamil Kościesza is an expert software developer experienced in COBOL and Java development and has experience working on large scale banking systems and batch processing.
In a recent project, he was involved in the implementation and customization of a new core banking system for a large bank. He has worked in the following technologies: COBOL, JCL, SQL, ISPW, Control-M, DB2, Oracle 11g, Java, Eclipse, JSP, Junit, WSDL, GIT, IBM Rational Change, IBM Rational Synergy, UNIX, Jenkins, Atlassian JIRA.