[Lumiera] Git question

Ichthyostega prg at ichthyostega.de
Fri Jul 12 02:24:36 CEST 2013

Am 11.07.2013 04:00, schrieb Hendrik Boom:
> Looking for advice as to best practice here.  I'm guessing something like
> this is available, but I don't know how this is actually put together out of
> the complicated git command options.

Hello Hendrik,

indeed, the greatest shortcoming of Git seem to be the sheer endless
possibilities for yet another workflow or usage style.

> With monotone, the working directory (where edit files and debug the program)
> is separate from the database (which is the revision management repository).
> Whne you join a project, you first clone the database, then set up one or
> more working directories that refer to the data base.

Git is only marginally different.

Git doesn't treat the working directory and the database as strictly different
things. But there is the option to use a database *without* a working directory
(this is the so called "bare git repository" and is typically used on a server,
where people are supposed just to push and pull from remote)

But first things first:

Working directory:
  Actually, with Git we always use a whole tree, i.e. a root directory with
  all its subdirectories

  Git stores the database in a subdirectory '.git', which resides within the
  root directory of the working tree.

Thus: each Git working tree has his own and separate database. In Git parlance,
we call this the "local repository of this working tree"

> Those working directories contain the (perhaps edited) files of the revision
> being worked on and smoe local status information, bit mostly they just refer
> to the database.

> It's quite useful to have multiple workspaces; each can be used for 
> investigating a different bug, for example.

Same with Git. You can use as much working trees as you like, and each
has its own local repository, i.e each working tree can have its own
state, separate of all the others.

> Not I gather there's some way to do somehtig like this in git.  Of course
> some cloned repository would take on the role of database, but how does one
> get another (maye I should call it a virtual) clone to refer to the first one
> so that the entire repository isn't duplicated?

The fact that you duplicate the database storage is usually not considered
an issue. Git uses storage very efficient, and modern harddrives are huge.

Thus: each working tree has its own database. But such a database can refer
to a "Git remote", which is just any other Git database.

This database is organised in terms of storage objects. Each such storage
object is just a binary BLOB, and has a unique SHA1 hash, only based on
the contents of the BLOB, and the history leading up to this contents.

This way, a file in a specific revision will have the same SHA1 hash
in all repositories (irrespective if remote or local). This makes the
SHA1 hash the primary coordinates, while it is irrelevant *where*
this object is stored, or how the branch it resides on is *named*

To stress this point: *only the concrete contents and the real
history line* counts. You can have this object in different
repositories, and you can use different branch names
in these different repositories. The content and the
history are the real meat.

Consequently, there are two working models, when it comes to using
multiple local working spaces

Model-1: one central remote.

You just git clone multiple local trees, all referring to the same
remote "origin" repository. E.g.

lumiera.org  <---->  local/workspace1
lumiera.org  <---->  local/workspace2

in this model, workspace1 and workspace2 aren't directly connected.
You can use each as you like, and will see just the differences
to the remote "upstream state"

Model-2: a chain of workspaces

You start out with a git clone of a remote repository. Then you
create a second workspace as a clone from the first one. E.g.

lumiera.org <----> local/workspace1 <----> local/workspace2

You do this just by using a *relative file system path* when
performing the second git clone. With such a setup:

 - in workspace1, the git remote called "origin" points to
   lumiera.org. Thus, any local changes you do here are
   tracked relative to the remote upstream repository.

 - in workspace2, the git remote called "origin" points
   to the local sibling directory "../workspace1"
   Thus, any local changes you do here in this second
   workspace are tracked relative to the last committed
   state of the sibling workspace1.

Thus: when you're in workspace2 and you "git pull", then
you're pulling just from the local sibling workspace1

In practice, you can (and often will) have all sorts of mixtures
of these two models. The key trick to achieve this is to define
multiple "git remotes". For example, you can have a git remote
called "lumiera", which points to the remote repository, and
another git remote called "workspace1", which points to "../workspace1"

And then you can push and pull your changes around to your hearts desire ;-)

Hopefully this makes the basic idea a bit more clear

Hermann V.

PS: if you're still concerned with the waste of storage when using multiple
local working repositories, there is an option to git clone which allows
you to use another *local* repository as reference.

But I would *strongly* recommend against using such fine points until you're
fairly familiar with using several remotes and tracking multiple repositories.
As a litmus test: until you're using "git rebase" as a routine operation when
following upstream, without considering it a mystery, and feel familiar with
moving around through "git reset", better stay away from such fine points and
try to use the default setup.

More information about the Lumiera mailing list