#1 Full-stack app – Git basics

Throughout Full-stack app post series I’ll try to document whole process of building an web application. I’ll cover common tools and principles that every software engineer has to have in order to write application on its own. Articles are created as part of my learning process of these technologies(or review of tools that I’ve been using for some time). I encourage you to start a discussion in comment section if you think that something could be done easier or more efficient way or you maybe completely disagree with me. Knowledge is built by reading and strengthened with practice and discussion with others. 

Intro

Writing code is usually iterative process. You write some code today. Then you add some more code few days later. Then you decide that your code needs refactoring. Then you add a new feature. After running tests you see that this feature broke rest of the code. So now you must manually go through your code and remove all lines that caused problems.

Next time you approach problems differently. Before you add any major changes to your code, you make a copy of directory where your code is. Now if you break something you can revert to your old code. Without even realizing you just made your first step to version control. Although, this is pretty inefficient method of versioning your code but almost everyone start it this way. This works for a simple codebase and when its size is small but isn’t efficient when you start working on larger codebases.

Version control

In order to efficiently work with codebases programmers developed version control. Version control allows you to manage changes to your source code. You can track who did changes and when, revise some code from the past(or even revert to it) etc. First of such tools came almost 50-years ago! SCCS(Source Code Control System) was the pioneer in VC and offered programmers to track changes on single files per on commit(change). SCCS is local version control system which means that all files and changes are stored locally. This worked well in that time because teams were small, internet was still a baby. Codebases eventually became larger and maybe more than one team needed to work on same code base. That lead do development of centralized version control systems. Idea was that there is one central repository which contains all the source code and developers check out files as they need it. This works great: everyone has access to codebase, multi-file commit is enabled, administrators have control over who can do what. Although CVCS seems great it has one major drawback – storing code only in one place represents single point of failure. If central server is unreachable at any moment, nobody can save their changes and collaborate with others. The main representative and most used CVCS tool is Subversion.
The idea of distributed version control system was developed in the 90’s. The idea is that everyone check-out whole repository from central server. Every clone is exact copy of central repository and can be used to restore it if at any point server fails. Today most popular DVCS tool is Git. Git was developed by Linus Torvalds in 2005, when Linux community stopped cooperating with BitKeeper and company revoked free-of-charge from their product.

CVSvsDVCSFlowDiag

Working with git

Git stores data as snapshots of the project over time.
source: htpps://git-scm.com/book

In order to efficiently work with git there are some basic concepts that you need to grasp. Git think of its changes as set of snapshots of a miniature file system. Every time you do commit git “takes picture” of your working directory and store it internal database. Its important to say is that git saves only files that changed and saves link to unchanged files in snapshot.
Almost every operation that is performed on repository is performed locally. This allows as to us benefits of VCS even if we don’t have internet connection and removes need for having central repository. Operations that requires connection are operations for synchronizing local repository with central(pull, push, fetch).

Every commit is marked by its checksum. That grants immutability of committed files and it’s impossible to make any undetected and undocumented change. You can’t fetch corrupted file or lose information without Git knowing it.

Files in git repository can be in three states: committed, staged and modified. Committed file is stored in local database that keep record of all changes. Modified file is changed, but changes aren’t stored in local database. Staged file is changed and is marked for adding to local database.

Working tree, staging area, and Git directory.

.git directory holds meta-data about repository and object database of project data. When we clone git repository this is the folder that is cloned. Working directory holds current checkout of some version of project. Sample workflow of working with git is as follows:

  1. We modify files in working directory
  2. We mark modified files for commit, which moves them to staging area(git add)
  3. We commit changes which takes files from staging area and saves snapshot in database.

Basic operations

Although there are many GUI tools to work with git sooner or later you will come to environment where you are destined to use command-line git. Most of the tools don’t even support all commands that git is capable of. But fear not! Even if you aren’t comfortable using command line you can work with git using less than 10 command and knowing approximately twenty commands will make you git master.
 

Command Description
git init
Initializes empty repository in current directory
git clone repo_url
Clones remote repository into current directory
git add file_name
Adds file to staging area
git commit -m "Commit message"
Save files from staging area to local database
git fetch
Download objects and refs from repository
git pull
Fetch and ingrate with local branch or another repository
git branch branch_name
Create new branch
git checkout branch_name
Switch to another branch
git diff
Called without arguments shows changes in working directory(without staged files)
git log
Shows log of all commits on current branch
git status
Shows status of working directory, current branch, are we behind or ahead of remote branct etc.
git remote add origin remote_url
Sets the address of remote repository
git stash [save|apply|pop]
Saves or applies staged files to/from stash stack
git merge branch_name
Applies changes(commits) from branch_name to current branch

This may seem overwhelming at first, but you will do most of git work just with commands from this list(remember 80:20 rule?). If you forget what some command does you can always type git help command_name and get detailed man page about that command.

Summary

  • Version control is important tool in software development that allows you to track, revise and backup all changes that you do on your codebase
  • There are few types of of version control – Local VCS, Centralized VCS and Distributed VCS
  • Distributed VCS are the youngest one that get most popularity when Git was created, back in 2005
  • Key concept: everyone has exact copy of remote repository which enables them to work offline and provides excellent backup technique if central repository fails
  • Git has three file states: modified, staged, commited
  • You can do most of the work with git using just about 15 commands

This post is more theoretical but I think that is very important to grasp the basic concepts before you start using git. I saw lot of my colleagues struggling with using git just because they don’t understand the key concepts. In the next article I will write about online git repositories and tools and also what is a git workflow and how to be most efficient. It’s important that you get confident with version control before you actually start coding. On most projects you will collaborate with others so knowing git(or any other VCS, but git is most widely used) is cornerstone of efficient workflow.
If you have any questions or you think that something is missing in the article feel free to leave comment, and if you liked the post subscribe for more content like this.

Resources
Git SCM book – https://git-scm.com/book/en/v2
Wikipedia article about VCS – https://en.wikipedia.org/wiki/Version_control

Lead by example

The end of semester – that wonderful time of year when your energy levels and driving motivation is at its lowest, but the amount of work to be done is huge. Work and project deadlines are tight, exams are coming and all that I can think about is how to get bed and sleep a little bit more. But one thing is still motivating me and has motivated me this whole semester.
This is my last year in bachelors so we had group project where our assignment wast to build web application. That included programming application from the ground up – back end part and front end part. The team choose me as team leader with argument that I am the most experienced – I was already working for few months and have completed some extra classes that involved lot of programming. I wasn’t really excited about it because I know that role of a leader comes with great amount of work and it’ll be time consuming.
Another challenge was that our mentor forced technology that none of us(including him) worked with before – Node.js. In this scenario he was more of a product owner than a mentor. What that meant for me is that I had to learn new technology so that I can help others when the get stuck, propose application and data architecture and handle code base when we started programming. This changed my perspective about creating an applications. Programming by specification(what I’ve been doing till that moment) is easier part. Designing that specification is the hard part.
The team had no prior knowledge(or very little) of source code management, multi-tier application architecture, database design and had only worked with Java and C before. So my first task was to give them quick starter about all these topics in order to begin with programming part. I came up with a quick-start lecture(about three hours) after which every one of them was familiar what is difference between logic and presentation tier, what is ORM and how is helpful in object-oriented programming and they got basic understanding how git works and how to use basic commands. Throughout semester we had pair programming sessions where I worked with them individually and helped them when they hit the wall.
Now, at the end of the semester, we got fully functional(and pretty good looking) application. But I think that we got much more than just an application. That application is just side effect of knowledge and experience that we collected while creating it. Every team member is now capable of efficiently using git. We learned new technology and packed few frameworks and tools in our toolkit(Express.js, Sequelize, NodeMailer, Mocha). Everybody got some insight how to create data model for application. And, maybe last but most important for me, I hope that everyone learned that there is no “I” in team – it’s always “We”.

“Mistake of my team is also my mistake!”

What was it for me? It was insight what’s like to lead a group of people. And to be quite honest it’s not easy and in moments I wished that somebody else was lead. But it was rewarding at the end. Guiding people in their learning process and seeing the results after time passes feels good.
Teamwork is skill that you must have in software engineering industry. Everyone should take opportunity to develop that skill while still in college. Don’t be lone wolf in team projects. Learn how to take suggestion. Shift from “What can team do for me?” to “What can I do for team?”. If you get role of team leader take it seriously. Help people as much as you can. Teach them skills that they may be missing even if that means twice as much learning for you. It will be valuable experience. If you are interested more in developing soft skills(you should be) here is good read from John Sonmez.