top of page
premier.png

Backup AWS Codecommit

Problem statement

Having a full backup of all Codecommit repos on the account is always a good idea in order to be protected from situations like accidental repository deletion, account hacker attack etc. Unfortunately the AWS Backup tool as of now does not support Codecommit backup natively. There is a solution from Amazon which utilizes Codebuild and Eventbridge:

However the downside of this solution is that it is activated on a certain repository when the commit to this particular repository is done hence if there are some not frequently used repositories present in the account - it may take a lot of time till they will be backed up. The solution below will backup all repositories in the account on a scheduled basis.

Solution design

The solution uses same components as the AWS solution:


The Codebuild process is triggered by Eventbridge, the codebuild takes the code from a separate Codecommit repository with source files and invokes a script which will go through all Codecommit repositories, pull all branches, create a zip archives and push it to S3 bucket.


The “buildspec.yml” file:

version: 0.2

phases:

  install:
    commands:
      - apt-get update -y
      - apt-get install -y jq

  build:
    commands:
      - chmod +x backup_codecommit.sh
      - ./backup_codecommit.sh

The “backup_codecommit.sh” script:

#!/bin/bash

set -ex

# variable CodeCommitBackupsS3BucketPrefix is exported into CodeBuild environment variables
backup_s3_bucket_prefix="${CodeCommitBackupsS3BucketPrefix:-"my-s3-bucket"}"
# Region and Account ID
aws_region="${AwsRegion:-"us-east-1"}"
aws_account_id="${AwsAccountId:-"00000000"}"

git config --global credential.helper '!aws codecommit credential-helper $@'
git config --global credential.UseHttpPath true

declare -a repos=(`aws codecommit list-repositories | jq -r '.repositories[].repositoryName'`)

for codecommitrepo in "${repos[@]}"
do  
    echo "[===== Cloning repository: ${codecommitrepo} =====]"
    git clone --mirror "https://git-codecommit.${AWS_DEFAULT_REGION}.amazonaws.com/v1/repos/${codecommitrepo}" "${codecommitrepo}/.git"
    cd ${codecommitrepo}
    git config --bool core.bare false
    for branch in $(git branch --all); do
        git checkout ${branch}
    done
    cd ..


    dt=$(date -u '+%Y_%m_%d_%H_%M')
    zipfile="${codecommitrepo}_backup_${dt}_UTC.tar.gz"
    echo "Compressing repository: ${codecommitrepo} into file: ${zipfile} and uploading to S3 bucket: ${backup_s3_bucket}/${codecommitrepo}"

    tar -zcvf "${zipfile}" "${codecommitrepo}/"
    aws s3 cp "${zipfile}" "s3://${backup_s3_bucket_prefix}-${aws_account_id}-${aws_region}/${aws_account_id}/${aws_region}/${codecommitrepo}/${zipfile}" --content-type application/x-gzip --region $AWS_DEFAULT_REGION

    rm $zipfile
    rm -rf "$codecommitrepo"
done

Full CloudFormation template:

AWSTemplateFormatVersion: '2010-09-09'
Parameters:
  RulePrefix:
    Type: "String"
    Default: "amd"
  CodeCommitBackupsS3BucketPrefix:
    Type: "String"
    Description: "S3 Bucket prefix for CodeCommit repository backups"
    Default: "amd-backup-codecommit-results"
  CodeCommitSourceRepoName:
    Type: "String"
    Description: "CodeCommit source repo name"
    Default: "amd-backup-codecommit-source"
  CodeCommitSourceRepoRegion:
    Type: "String"
    Description: "CodeCommit source repo region"
    Default: "us-east-1"
  BackupScriptsFile:
    Type: "String"
    Description: "Compressed file containing backup scripts and buildspec"
    Default: "codecommit_backup_scripts.zip"
  BackupSchedule:
    Type: "String"
    Description: "Backup schedule as a cron expression"
    Default: "cron(10 09 * * ? *)"
Resources:
  S3ResultBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub '${CodeCommitBackupsS3BucketPrefix}-${AWS::AccountId}-${AWS::Region}'
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: aws:kms
              KMSMasterKeyID: !Sub 'arn:aws:kms:${AWS::Region}:${AWS::AccountId}:alias/aws/s3'
      LifecycleConfiguration:
        Rules:
          - NoncurrentVersionExpirationInDays: 30
            Status: Enabled
      Tags:
        - Key: "backup"
          Value: "true"
      VersioningConfiguration:
        Status: Enabled

  CodeBuildProjectRole:
    Type: "AWS::IAM::Role"
    Properties:
      RoleName: !Sub '${RulePrefix}-CodeBuildProjectRoleForCodecommitBackup-${AWS::Region}' 
      AssumeRolePolicyDocument: 
        Version: "2012-10-17"
        Statement: 
          - Effect: "Allow"
            Principal: 
              Service: 
                - "codebuild.amazonaws.com"
            Action: 
              - "sts:AssumeRole"
      Path: "/"
      Policies: 
        - PolicyName: "codecommit-readonly"
          PolicyDocument: 
            Version: "2012-10-17"
            Statement: 
              - Effect: "Allow"
                Action:
                  - "codecommit:BatchGet*"
                  - "codecommit:Get*"
                  - "codecommit:Describe*"
                  - "codecommit:List*"
                  - "codecommit:GitPull"
                Resource: "*"
        - PolicyName: "logs"
          PolicyDocument: 
            Version: "2012-10-17"
            Statement: 
              - Effect: "Allow"
                Action:
                  - "logs:CreateLogGroup"
                  - "logs:CreateLogStream"
                  - "logs:PutLogEvents"
                Resource: "*"
        - PolicyName: "s3-backup"
          PolicyDocument: 
            Version: "2012-10-17"
            Statement: 
              - Effect: "Allow"
                Action: 
                  - "s3:putObject"
                Resource:
                  - !Sub "arn:aws:s3:::${CodeCommitBackupsS3BucketPrefix}-${AWS::AccountId}-${AWS::Region}/*"
  CodeBuildProject:
    Type: AWS::CodeBuild::Project
    Properties:
      Name: !Sub '${RulePrefix}-CodeCommitBackup-${AWS::Region}'
      Description: CodeBuild will backup all CodeCommit repo in this region
      ServiceRole: !GetAtt CodeBuildProjectRole.Arn
      Artifacts:
        Type: no_artifacts
      Environment:
        Type: LINUX_CONTAINER
        ComputeType: BUILD_GENERAL1_MEDIUM
        Image: aws/codebuild/python:3.5.2
        EnvironmentVariables: 
          - Name: CodeCommitBackupsS3BucketPrefix
            Value: !Ref CodeCommitBackupsS3BucketPrefix
          - Name: AwsRegion
            Value: !Ref AWS::Region
          - Name: AwsAccountId
            Value: !Ref AWS::AccountId
      Source:
        Type: CODECOMMIT
        Location: !Join
          - ''
          - - 'https://git-codecommit.'
            - !Ref 'CodeCommitSourceRepoRegion'
            - '.amazonaws.com/v1/repos/'
            - !Ref 'CodeCommitSourceRepoName'
      TimeoutInMinutes: 60

  EventRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: events-codebuild
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - codebuild:StartBuild
                Resource: !GetAtt CodeBuildProject.Arn
      RoleName: !Sub '${RulePrefix}-event-role-backup-codebuild-${AWS::Region}'

  CodeCommitBackupScheduledRule: 
    Type: "AWS::Events::Rule"
    Properties: 
      Description: "Scheduled rule for CodeCommit backups"
      ScheduleExpression: !Ref BackupSchedule
      State: "ENABLED"
      Targets: 
        - Arn: !GetAtt CodeBuildProject.Arn
          Id: !Sub '${RulePrefix}-CodeCommitBackup'
          RoleArn: !GetAtt EventRole.Arn

The buildspec and the bash script files are in zip archive, the archive is put to the created codecommit repository; the name of the repository and name of the zip archive are set in the Cloudformation template parameters.


Conclusion

Using the automation above it is possible to backup all codecommit repositories in the given region on a scheduled basis and not to rely on the presence of commits to repositories whilst using the AWS-provided solution.


Comments


bottom of page