Git: learn how to manage your code

by mark · Published 26 October 2016 · Updated 24 October 2016

If you are a developer, chances are that sooner or later you will start working with lots of files, and if you distribute your code, chances are you will start working with lots of people too. When one or both will happen a question will arise: “How do I mange it?” Let’s talk about VCSs and Git.

Too big project, too many files

Little projects may contain a few dozens of files, bigger ones may include dozens of thousands. Managing all these files is complicated by itself, but what happens when you make a mistake? You suddenly realize that you have messed up an important portion of your code, and that your backup (I hope you did one) is somewhere buried in your filesystem. Worst case you have nothing. You have officially lost many hours of time trying to change your code for the worse. On top of that you will have to spend more hours on fixing what has been done.

One developer, ten developers

The past situation is already frightening as it is. But what happens when there are a few developers that write on that very same code? Two of the developers might decide to work on the same feature and modify the same lines; each one wants to get his code incorporated in the main project, but that’s not possible since they modified the very same things in different ways. Things can’t work like this, and the two developers have wasted time working on a thing that could’ve been developed by only one of them. That’s why Version Control Systems were invented to solve these very two problems.

Version Control Systems

A Version Control System (VCS) is a system that keeps track of Versions. Each version represents the state of the project at a defined time. Each versions contains the either changes you made to the the code since the last version or a full snapshot of the code. Let’s now analyse the past two problems:

You realize you’ve made a mistake and a precedent version is better than the new one, that is not a problem: using a VCS you can easily roll-back to a precedent version without losing anything in a matter of seconds.
“Developer A” and “Developer B” start both working on the “add salt” feature, but each one doesn’t know the other is doing the same. “Developer A” comes with the “add salt” feature done; around the same time “Developer B” comes with the “add salt” feature done, but he also created the “add pepper over salt” feature. “Developer A” created a slightly better version of “add salt” compared to “Developer B”. Using a VCS that is not a problem. The “Developer A”‘s “add salt” feature is included in the main project; “Developer B” looks at the code from “Developer A”, discards his own “add salt” and modifies his “add pepper over salt” to fit “Developer A”‘s version. Both get their code incorporated, effectively cooperating (next time let’s hope they use a public forum to announce they are working on that very same feature).

Centralized VS Distributed VCS

The first VCSs used a centralized approach to the problem. There is a central server that holds all the versions and serves many clients, each time the main project is changed, the clients have to be notified. But the principal problem is the single point of failure:

If the server goes down in a Centralized VCS no one can operate the main project and no one can even access/contribute to it.
If the storage medium of the server gets corrupted or is broken, backups are needed to fix things up.

The next iteration of VCS is called Distributed VCS. These aimed to fix the two problems outlined above. In the centralized approach only the server holds all the versions. In the distributed approach every single client can hold all the versions. Each client knows the full history of the project and act as a form of “backup” for the project. Even if the “central” server goes down, the clients can still operate and make changes looking at the whole project. Also if the “central” server of the project is damaged, one of the many clients can step in and offer its “backup” to the project’s authors.

Finally Git

Git is a project started in 2005 by Linus Torvalds (yes, the same person that created Linux). Linus didn’t really like the VCSs that were around that time (the story is a bit more complicated), so he started a new one to support the growth of his main project: Linux. Git is a distributed VCS that allow developers to store versions of their code called commits in a local database called repository stored in a subdirectory of the project called .git . Git uses checksums to ensure integrity and identify commits. Each commit is authored using SSH keys. Each repository contains one or multiple branches that allow multiple, parallel code (usually the same feature) to live in the same repository. The local directory where you can modify files is called working directory. In the working directory files can be modified by the developer and are not affected by other developers. Git also supports a pseudo-centralized workflow that allows push (incorporate my code into the main project) and pull (fetch all the versions from the server and put them on my machine).

Git is NOT GitHub

Beginners usually tend to think that Git and GitHub are the same thing. That’s not true at all! GitHub is a commercial site that provides a platform for developers built on top of Git. Git is independent from GitHub and can be used without an account to the latter. As a matter of fact you can use Git without any other tool, but your experience will be significantly better using another one like GitHub or GitLab.

Great, where do I start?

Starting with Git can be a bit painful, so don’t feel down if you don’t get everything in the first place. In the next weeks I will write a through guide to allow beginners to start moving the first steps in the Git world.

Image courtesy of mark | marksei

Author
Recent Posts

mark

The IT guy with a slight look of boredom in his eyes. Freelancer. Current interests: Kubernetes, Tensorflow, shiny new things.

Cookie	Duration	Description
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_60468161_1	past	Set by Google to distinguish users.
_ga_DR9SCJ09BV	2 years	This cookie is installed by Google Analytics.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.

Cookie	Duration	Description
edgebucket	session	Reddit sets this cookie to save the information about a log-on Reddit user, for the purpose of advertisement recommendations and updating the content.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	14 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
csv	2 years	No description available.
GoogleAdServingTest	session	No description
wp_api	past	No description
wp_api_sec	past	No description
_pk_id.1.95fa	1 year 27 days	No description
_pk_ses.1.95fa	29 minutes	No description
__smSessionId	9 hours	No description available.
__smToken	1 year	This cookie is set by the Sumo. This cookie is used for verifying whether the user is logged in or not.

Git: learn how to manage your code

Too big project, too many files

One developer, ten developers

Version Control Systems

Centralized VS Distributed VCS

Finally Git

Git is NOT GitHub

Great, where do I start?

You may also like...

Leave a ReplyCancel reply

Recent Posts

Recent Comments

Categories

Latest tutorials

Git: learn how to manage your code

Too big project, too many files

One developer, ten developers

Version Control Systems

Centralized VS Distributed VCS

Finally Git

Git is NOT GitHub

Great, where do I start?

Related posts:

You may also like...

Encryption and Hashing differences and use cases

A gentle (yet complete) introduction to Linux shell and terminal

What is Big Data?

Leave a ReplyCancel reply

Recent Posts

Recent Comments

Categories

Latest tutorials