1 of 27

Basic of Git & GitHub

Lecture

1

CMSC 320

 

INTRODUCTION TO �DATA SCIENCE

– FARDINA FATHMIUL ALAM

           (fardina@umd.edu)

2024

2 of 27

RECAP: Importance

  • Python: It's a user-friendly language for data analysis.
  • Git: Helps manage code and data changes when working with a team.
  • Pandas: Simplifies data cleaning and manipulation.
  • Databases: Needed for storing and retrieving data efficiently.

2

These skills are essential for effective data analysis, collaboration, and handling data in real-world projects.

3 of 27

Git

3

4 of 27

Version Control System (VCS)

A software tool used by developers to manage changes to source code over time.

  • Tracks modifications to files, allowing multiple developers to collaborate on a project simultaneously while maintaining a complete history of all changes made to the codebase.
  • If there are issues, a version control system allows you to revert to an earlier code version.

Git is the most popular version control system in the world.

4

5 of 27

VCS Types

5

There is a single “central” copy of your project somewhere (probably on a server), and programmers will “commit” their changes to this central copy directly.

Each programmer has a full project copy on their local machine. They commit changes locally and use push and pull operations to sync with a shared server for collaboration.

6 of 27

Git

Git is a version control system that helps developers manage and track changes in their code. It's like a time machine for your code, allowing you to go back to previous versions if something goes wrong or if you need to see how your code looked at a certain point in time.

6

7 of 27

Git

A distributed version control system , used by most major corporations and labs

  • Keep a history of previous versions
  • Everyone mirrors the entire repository instead of simply checking out the latest version of the code
    • Unlike a centralized version control system, where the project is stored on a central server, each team member using Git has a copy of the project with its history stored on their local machine.
  • Develop simultaneously on different branches
    • Easily try out new features, integrate them into production or throw them out
  • Collaborate with other developers
    • “Push” and “pull” code from hosted repositories such as Github

7

8 of 27

Why Git?

  • Most commonly used software for tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development.

8

9 of 27

Installing Git

9

10 of 27

Git Repository

After installing Git, you can initialize it for a project to create a new Git repository.

The Git Repository, represented by the '.git' directory, is a database that contains all the project source code and the history of all code updates. The '.git' directory is typically located at the root of the project directory.

10

11 of 27

Git Repository

While a Git repository can be stored on your local machine, especially for solo projects, it's common practice to store project code in a repository on a remote server.

  • Git repositories may include configuration files such as .gitignore for specifying ignored files and directories, and .gitattributes for defining attributes like line endings or merge strategies (e.g., ensuring consistent line endings across platforms).

11

12 of 27

Github

GitHub is a web-based platform that serves as a hosting service for Git repositories.

  • Provides a wide range of features and tools to facilitate collaborative software development and version control.
  • Serves as a central hub for software development, enabling collaboration, code sharing, and project management for individuals, teams, and communities worldwide.

12

13 of 27

Git Commands

  • git clone <project URL>
  • git add my_file.c
  • git commit
  • git status
  • git branch
  • git checkout 
  • git push

13

14 of 27

Git Command

14

command

description

git clone url [dir]

copy a git repository so you can add to it

git add files

adds file contents to the staging area

git commit

records a snapshot of the staging area

git status

view the status of your files in the working directory and staging area

git diff

shows diff of what is staged and what is modified but unstaged

git help [command]

get help info about a particular command

git pull

fetch from a remote repo and try to merge into the current branch

git push

push your new branches and data to a remote repository

others: init, reset, branch, checkout, merge, log, tag

15 of 27

To start a new project with Git

We can either

  • initialize a new repository in an existing project directory or
  • clone an existing repository from a remote location (like GitHub) onto your local machine.

15

16 of 27

Git init

Initialize a new repository:

Git init creates an empty git repository for project tracking. Once the repository is created, any files added to the repository are automatically tracked.

command:

16

17 of 27

Git Clone

  • Creates a clone/copy of an existing repository into a new directory.
    •  Eg.Clones a repository from a remote location to your local machine.

17

18 of 27

Git Clone

    • Downloads an exact copy of the files from the remote/central repository (hosted on a platform like GitHub, GitLab, etc.) into a newly initialized local repository on your computer.
    • From there, we can check out files and make changes.
    • This example establishes a new Git repository locally named 'team-project', allowing us to work on the project locally with version control.

18

19 of 27

Adding a file to git repository: git add, commit

19

  • $ cd desktop/test-GitDemo
  • $ git init
  • $ echo “line1” >> firstfile.txt
  • $ echo “line1” >> secondfile.txt
  • $ git status

git init command is used to initialize a new Git repository in a directory.

→ shows which files have been changed and need to be committed to the repository.

20 of 27

Adding a file to git repository: git add, commit

20

  • $ git add >> firstfile.txt : add files to the staging staging areaOr $ git add >> firstfile.txt secondfile.txt
  • $ git commit –m “first two files commit” : permanently storing changes  in the repository. 
  • $ git status
  • $ git log : displays a history of commits in your Git repository

Step 1: Staging telling Git to include this file in the next commit.

Step 2: Commit committed those changes with a message

Adding Files to Git

Staging is the process of organizing and preparing our project files for a commit. It is the intermediate step between modifying our files and storing them permanently in the repository.

21 of 27

Git Staging Area

The staging area in Git is an intermediate area where changes are prepared before they are committed to the repository.

  • Staging is the process of organizing and preparing our project files for a commit.
  • It is the intermediate step between modifying our files and storing them permanently in the repository.

21

22 of 27

Git Branch

22

Show all the branches that exist in the rep

* 🡪 Current Branch

We can use the git branch command to

  • show/list all project branches.
  • create a new branch
  • delete an existing branch or

23 of 27

Git Branch

23

* 🡪 Current Branch

Create a new branch

We start with a repository, create our branch, make changes on our branch, and when we're finished, we merge those changes into the "main" branch. We ensure everything works together and then upload it back to GitHub.

Delete an existing branch:

git branch -d <branch name>

24 of 27

Git Checkout

24

REMEMBER:

  • Before switching to a new branch: Changes in the existing branch must be committed.
  • The new branch must exist on your local machine.

Switch between branches in a repository.

25 of 27

Git Push

25

Upload / push your local Git commits to a remote repository, such as one hosted on a platform like GitHub.

  • Crucial for collaborating with others and keeping your remote repository up to date.

$ git push origin-main local-master

$ git pull <remote-repo-url>

Git Pull

Refers to refers to fetching changes from a remote repository. Download and update our local repository with the changes made to the remote repository. First fetch the changes and then merge them into the working branch.

26 of 27

Git Overall

26

  1. Use an existing remote master git repository
  2. Clone a private local copy of the remote master repo
  3. Add and commit changes to your local private cloned copies.
  4. Local changes can be shared others by pushing them to the master repo.
  5. Your partners will then pull your pushed changes from the remote master into their local repo. 

27 of 27

A Basic Git Workflow

27