ForgeFed as a mean for Federated GitLab
Recently I am involved in the ForgeFed Working Group, which target it is to create a specification to assist different Software Development Forges (like GitLab, GitHub, Gitea, Gogs, etc.) to form a federated network.
There were and are some challenges I would like to write down here to order my mind. I somehow feel more comfortable about publishing this here then (just) sending it to the ML, so please bear with me here. :)
The core workflows that where quickly identified by the WG as central have been namely Forking, Pull Requests and Issue Tracking. Because the success of ActivityPub in the Fediverse, ForgeFed atm should instrument ActivityPub although there are discussions (and have been there from the beginning on) if AP is required or should be used. But I will leave aside this discussion for the moment because the challenges I would like to discuss here live on a different level as might become more clear during explaining them.
First, let’s explore the different workflows from above.
Forking
Forking has two meanings.
The first one is forking off a project, which refers to a group or individual
splitting from a project for whatever reason and continuing the project in a
different direction.
The second is much less drastic and just describes a necessity of distributed
version control: For others being able to incorporate your changes you need to
have available a public repository everyone can read, which is just a copy of a
other publicly accessible repository, referring to the [distributed
workflows] from ProGit here.
In context of Software Development Forges a Fork can be both of this things. If you see a project you are interested in and would like to work on, you create a public copy of it and start working, making your changes, etc. If your changes are accepted back (we will come to the how in just a moment), everything is good and the Fork was just a public developer copy. But if your changes are not accepted, e.g. because the project is dead, does not like the direction your are going or whatever reason, and you continue maintaining your development fork as an independent project, it now becomes a project fork.
But technologically both forks are not very different, in the end both are just a diverged copy of a repository.
The interesting bit about Forking in the context of Forges is the traceability:
If you fork a project on GitLab or GitHub, your project is a) visually marked as
a fork and b) appears in the list of forks of the origin repository.
This could help developers (or even users) in example finding still forks of the
project that are still alive if the original project dies.
So Forking in the second sense is nothing new to distributed version control, so you might ask why even bother about it?
The challenge ForgeFed is addressing is the interoperability of Forges. While it is easy to create a Fork in a centralized environment (it’s just copying data around), it’s more complicated in a federated environment. Let’s me visualize this using a small example.
Imagine your are on GitLab.com and see a interesting project (Awesome Project)
you would like to contribute to. If we assume you already have an account on
gitlab.com, you just click the fork button, and et and at voilá you have a
fork.
Now imagine the same, but being on your own GitLab instance at
mypersonalinstance.com and discovering the project on gitlab.com. If you are not
interested in Federation, you now could just create you a account on
gitlab.com
and do the same thing as above. But we don’t want that. We want that you can
click the fork button on gitlab.com, telling it you would like to fork it to
mypersonalinstance.com and then having it there! (Leaving out authentication on
your personal instance for brevity.)
The problem here now is: How does gitlab.com tell mypersonalinstance.com that
you would like to create a fork of the repository? As a developer you might
would say that gitlab.com could sends a request to a endpoint on the personal
instance. And that’s the way to go, the work ForgeFed does here now is to
specify where this endpoint is, so that gitlab.com does not need to guess
whether the request has to got to mypersonalinstance.com/fork
,
mypersonalinstance.com/createFork
or
mypersonalinstance.com/hello/plz/create/fork
and how this request should be
formatted. (A GET
? A POST
? A PUT
? Repo as a parameter? As body? As
header?) And how does mypersonalinstance.com tell gitlab.com it has created
the fork so it can be integrated in the forking graph? A HTTP POST
? Sending an
eMail to the administrator? Sending a dove?
How exactly all this need to be done is currently worked out by the WG. Let’s now move on to Pull Requests. (Note that the working group focuses on the exchange specification only, not on the UI things.)
Pull Requests
Pull Requests, also called Merge Requests, are the opposite of Forking: They are about bringing the changes from one (forked) repository to another repository (typically the origin repository).
Let’s continue the example form above and assume you have made some (obviously awesome) changes to Awesome Project. If you are in a centralized environment for the developer this is as easy as clicking a Create Pull Request button and typing in some informational blah-blah and then, et voilá, having created a Pull Request. In a federated environment, your mypersonalinstance.com should offer you a similar button, but now (again) needs to know where to send the request (which your instance could memorize for you) and how to format it.
You now might say this is easy, but you still have to sit down and write down a document so the different Forges know how to to this steps. And this is what the working group is doing. This takes a little bit longer because in the WG we decided to make ForgeFed agnostic to the version control system in use, so you cannot simply specify the following because there might be a completely different VCS.
# if you get the HTTP request
#
# gitlab.com/user/awesomeproject/pr?
# origin=git%40mypersonalinstance.com%2fuser%2fawesomeproject.git&
# branch=awesomefeature
#
# Then
git remote add mypersonalinstance.com/user/awesomeproject.git \
git@mypersonalinstance.com/user/awesomeproject.git
git fetch mypersonalinstance.com/user/awesomeproject.git
# And create a Merge Request for the
# `mypersonalinstance.com/user/awesomeproject.git/awesomefeature` branch
# [...]
While one could now require all Pull Requests being sent around as patch series, we would still be able to make two Forges that are talking about a Git repository still being able to handle the process through Git, we’re currently discussing a more generic approach to transfer changes. (In the worst case one can still fall back to patch series.)
Another challenge (both in Forking and PRs) is to not requiring the user to
specify the VCS URL to the repository (git@gitlab.com/user/repo.git
), but
simply the URL of the forge repository (https://gitlab.com/user/repo
). While
both GitLab and GitHub allow HTTPS-cloning through the URL, there needs to be a
way one forge telling another forge “for /user/repo
you can use HTTPS via
[URL A], and the Git protocol via [URL B]”.
The WG is trying to address this using JSON-LD, meaning that one Forge can send
a JSON-LD query to https://gitlab.com/user/repo
and the being able to
determine all the important URLs, e.g. for PRs, cloning, etc.
I hope that makes sense. :) Anyway, let’s move to the most complex topic: Issue Tracking.
Issue Tracking
First, Issue Tracking here is a term derived from GitHub, but should cover all Issue/Bug/Ticket tracking systems (BITT systems) that make any sense. In the WG there have been discussions about this, but mainly in regard to Distributed Issue Tracking. I have not yet thought this through thoroughly, but I would say that issue tracking might does not need to be covered by ForgeFed like Forks and PRs are. I would rather say that similar to ActivityPub, the WG could provide a basic Framework how BITT systems could federated with each other.
But as said, I have not thought this trough yet, so let’s see if this makes sense. :)
Thanks for reading.