从svn迁移到git。哪个选项最好：巨型树干，子模块，子树

本文介绍了从svn迁移到git。哪个选项最好：巨型树干，子模块，子树的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道同样的事情有很多问题，但我仍然需要更多信息。
我正在研究将我们的SVN repo迁移到git的可能性，并试图了解什么方法（monolith trunk，submodules，subtrees等）对我们的回购将是最好的。

以下是关于我们的项目和SVN信息库的一些信息：

项目是java web应用程序打包成战争。

它是模块化应用程序。每个模块由单独的团队开发，
然后打包为jar。
战争取决于这个罐子。

基本上我们的结构如下所示：

  repo 
 | --application（war）
 | -module1（例如，ui的东西）
 | --module1Submodule1 
 | --module1Submodule2 
 | -module2（例如，数据库访问的东西）
 |  -  .. 。

每个模块都有它自己的标签和分支。

在本地机器上使用所有分支，标签等的svn repo的大小为：

超过250万个文件

超过20Gb空间

有311615个修订

文件大部分是源代码，没有大的二进制对象

典型用例：

200+整个团队的开发和质量保证

不同的团队承诺为他们的模块/子模块。（这可能是monolith git repo的
问题，因为git需要在推送前抽取所有更改
，否则svn只会警告过时的更改）

分支模块

分行申请

未来使用情况：

Gerrit

开发者提交，提交被审核，测试针对提交运行，
如果是绿色，则提交被批准合并为主分行

问题是：

我们可以考虑把这样的回购看作是一个大型的git（我的意思是有很多帖子指出git非常适合大型回购，但什么是大？）

每种方法的优点和缺点是什么：

Monolith回购（只是git as svn，anti-pattern？）

子模块

子树（我是否认为模块中的每个更改都需要在子树回购中提交，然后将更改提交到聚合子树回购？）

Separa为每个模块添加补贴

其他任何..

他们每个人都是？

我需要尽可能多的链接（我没有找到任何官方链接，指出'大型回购缓慢'）

预先感谢您！历史

使用git svn可以保留所有提及的方法的历史记录：
即使切回以前的提交是可能的。

但是，有人建议不保存历史记录，只是让svn版本库冻结6个月左右，而所有的历史记录都会在git repo中更改。我不同意这样的建议，因为历史对我们的项目是必不可少的。我敢打赌，没有人接受这样的解决方案。

巨型行李箱方式
$ b

即使您只计划在
上工作一个子目录（主用例），您也必须克隆整棵大树。

某些git命令会很慢（例如：git status，因为它需要
来检查整棵树）

即使你调整jenkins来触发build只对特定部分
的repo（this可以使用jenkins git
插件的include属性来完成。它仍然需要拉回所有的回购来执行构建。
这几乎不会影响所有的工作，因为即使在构建小模块时，干净结账也会花费很多时间。

关注点：在整个团队中有超过200名开发人员和质量保证人员，我认为最终推动这些变更将会非常不安。

只有在审核通过
gerrit并且测试通过后才会将更改推送到master分支，所以我们不会持续流动
pull-push-fail-pull-push

然而，如果主分支被更改
，因为commit被推送到gerrit，那么gerrit会拒绝合并，它需要点击'rebase'
按钮并重新运行测试。

Linux内核具有monolith repo，因为c / c ++没有依赖关系
管理像java一样：构建一个类似jar的war内核tar
依赖不是这种情况。

测验

步骤，使用这种方法的成本和迁移总成本？
$ b

git svn克隆SVN_URL REPO_NAME

詹金斯的东西

它如何支持代码门控？从VCS /工具角度需要做什么更改？

Jenkins在scm触发器中应该有包含过滤器来过滤
项目的特定部分。并不难，但仍然需要一些努力来设置和验证它们。如果在构建之前擦除
workspace，整个回购应该克隆所有
时间。它可以增加从提交到批准
测试的整体时间，因为结帐速度会很慢。

什么是高效的开发人员工作流程？
$ b

开发人员使用本地/远程功能分支

将更改推送到gerrit
li>

Gerrit验证针对测试的更改

更改已合并到master分支中

子模块

大多数注意事项都在这里解释，这里

测验
什么是步骤，它们的成本使用这种方法迁移的总成本是多少？

git svn克隆每个模块的SVN_URL REPO_NAME

创建聚合回购

为每个模块执行子树合并

它如何支持代码门控？从VCS /工具角度需要做什么更改？

看起来Gerrit支持子树合并效果不是很好
（）

但我们无法确定尝试

詹金斯的东西。触发子树repoes和聚合repo
变化（argh！在两个地方没有意义！）

什么是有效的开发人员工作流程？（Gerrit进程被忽略）
$ b

开发人员修改子树中的内容（内部聚合回购）

开发者提交聚合回购

开发人员不会忘记将更改推送到原始回购（
sense！）

开发人员不会忘记不要在一次提交中将子树变化与合计
回购更改混合在一起

再次像子模块一样，代码/变化出现的地方（repoes）。不适合我们的情况。

单独的回购 p

解决方案并遵循原始的git内涵。爬行动物的粒度可能会有所不同。最细粒度的情况是每个maven发布组都有回购，但它可能导致回购太多。我们还需要考虑一个特定的svn commit会影响多个模块或版本组的频率。如果我们看到，那个提交通常影响3-4个发布组，那么这个组应该形成一个回购。

另外我相信至少要将api模块与实现模块分开是很有价值的。

$ b
测验
使用此方法的步骤，成本和迁移总成本是多少？

git svn clone SVN_URL REPO_NAME对于每个或多或少的细粒度
模块数量

它如何支持代码门控？从VCS /工具角度需要做什么更改？假设完整的CI运行需要15分钟。

Jenkins分别触发每个回购。没有包含过滤器。
只需结账，构建，部署。

什么是高效的开发人员工作流程？

开发人员为每个回购使用本地/远程功能分支

将更改推送至gerrit

Gerrit验证针对测试的更改

更改合并到master分支中

I know there are much questions about the same thing but i still need more information.I am investigating possibility of migrating our SVN repo to git and trying to understand what approach (monolith trunk, submodules, subtrees, etc) will be the best for our repo.
Here is some information about our project and SVN repository:
Project is java web application packaged is war.
It is modular application. Each module developed by separate team andthen packaged as jar.
War depends on this jars.
Basically our structure looks like:
repo |-application(war) |-module1 (for example, ui stuff) |--module1Submodule1 |--module1Submodule2 |-module2 (for example, database access stuff) |-...
Each module has it's own tags and branches.
The size of svn repo on my local machine with all branches, tags, etc is:
over 2,5 million files
over 20Gb space
there are 311615 revisions
Files are mostly source code, no large binary objects
Typical usecases:
200+ Dev and QA in whole team
Different teams commit to their modules/submodules. (Can it be aproblem with monolith git repo, as git requires to pull all changesbefore push, svn warns about only out-of-date changes)
Branch module
Branch application
Future usecases:
Gerrit
Developer commits, commit is reviewed, tests are run against commit,if green, then commit is approved to merge to 'master' branch
The questions are:
Can we consider such repo as a large for git (i mean there are a lot of posts which note that git scales badly for large repos, but what is 'large'?)
What are pros and cons of each of approaches:
Monolith repo (just git as svn, anti-pattern?)
Submodules
Subtrees (Am i right that every change in module will require to commit in subtree repo and then pull change to aggregated subtrees repo?)
Separate repos for each module
Any other..
Can history from SVN be preserved for each of them?
I need as much links as possible (i didn't find any official links for 'slow for large repo')
Thank you in advance!
解决方案
History
History can be preserved for all mentioned approaches by using git svn: http://git-scm.com/book/en/Git-and-Other-Systems-Migrating-to-GitEven switching back to previous commits is possible.
However, there were suggestions to not preserve history and just leave svn repository freezed for about 6 months, while all history will change in a git repo. I disagree with such advices because history is essential for our project. I bet no one accept such solution.
Giant trunk approach
You have to clone the whole big tree, even if you only plan onworking on one subdirectory (main use case)
some git commands will be slow (for example: git status, as it needsto check whole tree)
Even if you tune jenkins to trigger builds only for particular partsof repo (This can be done using "include" property of jenkins gitplugin). It is still required to pull all repo to perform a build.This will hardly impact all the work, because "clean" checkout willtake much time even for building small modules.
Concern: Having 200+ Dev and QA in whole team, I suspect it will be quite uneasy to eventually push the changes.
Changes are pushed to master branch only after review is approved ongerrit and tests were passed, so we won’t have continuous flow ofpull-push-fail-pull-push
However gerrit could reject merging if master branch was changedsince commit was pushed to gerrit, it will require to click ‘rebase’button and rerun tests.
Linux kernel has monolith repo, because c/c++ have no dependencymanagement like java has: building a kernel tar like war with jardependencies is not the case.
Quiz
What are the steps, their cost and total cost of migration using this approach?
git svn clone SVN_URL REPO_NAME
Jenkins stuff
How can it support code gating? What changes are required from VCS / tools perspective? Suppose here that full CI run takes 15 minutes.
Jenkins should have "include" filter in scm trigger to filter changesfor particular part of project. Is’t not that hard, but stillrequires some efforts to set up and verify them. In case of "wipeworkspace before build" builds, whole repo should be cloned all thetime. It can increase overall time from commit to "approved bytests", because checkout will be quite slow.
What are efficient developer workflows?
Developers use local/remote feature branches
Push changes to gerrit
Gerrit verifies changes against tests
Change is merged to master branch
Submodules
Most caveats explained here http://git-scm.com/book/en/Git-Tools-Submodules and here http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/
The main issue is that you will have to commit twice
To submodule itself
To aggregating repo - to update submodule No sense. Why you ever needaggregating repo if dependencies are managed through artifacts repo?
Actually submodules created for cases when there is a library which can be reused with different projects, but you want to depend on particular tag of the library with ability to update reference in future. However we are not going to tag each commit (only release after each commit) and changing dependencies versions (to released ones) in war will be easier than maintaining submodules approach. Java dependency management make things simpler.
It is not recommended to point to submodule head and leads to troubles with submodules, so this approach is dead end for going to snapshots. And again we don’t need it because java dependency management will do everything for us.
QuizWhat are the steps, their cost and total cost of migration using this approach?
git svn clone SVN_URL REPO_NAME for each module
Create aggregating git repo
Add module repositories as submodules to aggregating repo
How can it support code gating? What changes are required from VCS / tools perspective? Suppose here that full CI run takes 15 minutes.
Gerrit supports both merges and commits to submodules, so it shouldbe ok.
Jenkins stuff - triggers on submodules changes and aggregating repochanges (argh! no sense in two places!)
What are efficient developer workflows? (Gerrit process is ommited)
Developers commit into submodule
Making a tag of his commit
Developer goes into aggregating repo
cd into submodule, checkouting tag
commit aggregating repo with changed submodule hash
Or
Developer changes submodule
Pushes change to submodule to not lose changes
commit aggregating repo with changed submodule hash
As you see developer workflow is cumbersome (requires to always update two places) and doesn’t suit our needs.
Subtrees
The main issue is that you will have to commit twiceTo tree merged subdirectoryPush changes to original repo
Subtrees is a better alternative to submodules, it’s more robust and merges source code of submodules to aggregating repo instead of just referencing it. It makes things simpler to maintain such aggregating repo, however the problem with subtrees is the same as for submodules, making double commits is totally useless. You are not forced to commit changes to original module repo, and can commit it with aggregating repo, it can lead to inconsistense between repos...
The differences are explained quite well here: http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/
QuizWhat are the steps, their cost and total cost of migration using this approach?
git svn clone SVN_URL REPO_NAME for each module
Create aggregating repo
Perform subtree merge for each module
How can it support code gating? What changes are required from VCS / tools perspective? Suppose here that full CI run takes 15 minutes.
Looks like Gerrit supports subtree merges not very well(https://www.google.com/#q=Gerrit+subtrees)
But we can’t be sure untill try
Jenkins stuff. Triggers on subtree repoes and aggregating repochanges (argh! no sense in two places!)
What are efficient developer workflows? (Gerrit process is ommited)
Developer changes something in subtree (inside aggregating repo)
Developer commits aggregating repo
Developer doesn’t forget about pushing change to original repo (nosense!)
Developer doesn’t forget to NOT mix subtree changes with aggregatingrepo changes in one commit
Again like with submodules there is no sense in having two places (repoes) where code/changes are present. Not for our case.
Separate repos
Separate repos looks like a best solution and follow original git intension. Granularity of repoes can vary. The most fine-grained case is to have repo per maven release group, however it can lead to too many repos. Also we need to consider how often one particular svn commit affects several modules or release groups. If we see, that commit usually affects 3-4 release groups then this groups should form a repo.
Also i believe it’s worth to at least separate api modules from implementation modules.
QuizWhat are the steps, their cost and total cost of migration using this approach?
git svn clone SVN_URL REPO_NAME for each more or less fine-grainednumber of modules
How can it support code gating? What changes are required from VCS / tools perspective? Suppose here that full CI run takes 15 minutes.
Jenkins triggered for each repo separately. No ‘include’ filters.Just checkout, build, deploy.
What are efficient developer workflows?
Developers use local/remote feature branches for each repo
Push changes to gerrit
Gerrit verifies changes against tests
Change is merged to master branch

这篇关于从svn迁移到git。哪个选项最好：巨型树干，子模块，子树的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！