cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Hosting in various countries

From: Jamie Lokier <jamie_at_shareable.org>
Date: Tue, 23 Mar 2010 19:03:51 +0000

Yang Tse wrote:
> 2010/3/23, Daniel Stenberg wrote:
>
> > A, B and C can sync up with the ME branch at any time and thus A B and C
> > all get each other's changes once they've been deemed to go public.
>
> I suppose that I was expecting to much. It seems that version control
> systems are still in the dark-age relative to what database systems
> have already achieved long time ago relative to things such as
> 'two-phase commit', 'multi-database transactions' , database
> partitioning across systems and online and offline database
> replication.
>
> It seems that version control systems use the term 'distributed'
> simply as a buzzword to indicate its capability to 'manually' get or
> transmit changes from a 'same-kind' version control system. After all
> someone has to push or pull.

No: It's *intentional* not to sync continuously.

It's a totally different intention than a coherent distributed database.

If it sync'd whole repos continuously, it would lose the DVCS
paradigm, which is the ability to work on your own repos independently
and sync when you are happy with the contents - and only those parts
you want sync'd.

There is no question that they are "distributed". Just in a different
sense to distributed-transaction databases.

Note that *adding* continous, transactional synchronisation would not
be considered a step forward. There is no secret, unrealised desire
to do it. In some ways, that would be retrograde: If you depend on
distributed transactions, then you end up depending on being online
all the time.

Database offline replication, which you mentioned, requires
application-specific merging strategies. (Assuming both sides of an
offline partition can update the database.) You can't have fully
distributed transactions, offline partitions and updates on both sides
without providing merging strategies. For source code, that means
occasional manual interaction, so the process cannot be fully
automatic - not even in theory.

And since it can't be fully automatic, even in theory, it doesn't make
sense to make it fully automatic 90% of the time, with the other 10%
feeding a delayed queue of "I can't reconcile these databases, you
must manually edit to progres" requests triggering at inconvenient
times such as network coming up, and blocking your work.

Anyway, in practice, you don't tend to keep the same contents in each repo.

For example, in your personal repo (on your laptop) you'll have the
upstream public branch(es), but you'll also have your own development
branch(es) that you don't want published (e.g. with debugging printfs
added, ideas you tried but aren't happy with yet), and perhaps staging
branches where you have prepared something for review, but don't want
it committed to the public mainline until it's reviewed.

The only time you'd want the same contents completely and
transactionally synchronised is for the reasons you have distributed
databases: geographic redundancy, network load spreading, offline
access etc.

The DVCS don't address those because they aren't considered relevant.
They use "distributed" to mean something different, which is just as useful.

> Even CVS is distributed. You setup a CVS repo and commit all you want.
> I can setup another CVS repo with all the contents of yours. I work on
> mine all I want, and when the moment arrives to place my changes on
> yours I can do it.

You can also just manage a patch queue, or lots of local source trees :-)

Nobody says these tools do something which wasn't possible before.
Only that they make it a lot easier - which they do.

> A poor man's replication system could equally be setup for CVS or git
> using post commit hooks. But if a clash arises, which is more likely
> on a very busy repos, manual aid is equally required.
>
> Or does someone know of a VCS capable of doing multi-repo transactions
> out there, commercial or not?

I don't know of any which support *transactional* commits to multiple
repos at once. A post-commit hook triggering a push or pull does what
you want in practice, and it's quite fast.

As far as I know, the issues which arise from lack of distributed
transactions, namely that there could be merge conflicts arising in
the background (during post-commit sync), and no guarantee of
immediate visibility, occur from time to time anyway in
distributed-transaction systems when the network or remote node are
down. So you have to handle them anyway; there is no point pretending
they don't occur.

And as a bonus, background post-commit sync lets you get on with your
work more quickly, compared with waiting for an equivalent transaction
to complete. (You can just wait for the sync to complete if you
care.)

-- Jamie
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2010-03-23