caddy s3 post

This commit is contained in:
Jan Wolff 2024-08-06 19:26:41 +02:00
parent 133f6c4fc5
commit ab9fc93782
6 changed files with 126 additions and 11 deletions

View file

@ -1,5 +1,3 @@
image: debian:12-slim
stages: stages:
- build - build
- deploy - deploy
@ -14,19 +12,16 @@ build:
- public - public
expire_in: 1 week expire_in: 1 week
deployment: deployment:
stage: deploy stage: deploy
image: minio/mc:latest
only: only:
- main - main
before_script: before_script:
- apt-get update -y && apt-get install openssh-client rsync -y - mc alias set target $S3_HOST $S3_ACCESS_KEY $S3_SECRET_KEY
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- echo "$SSH_KNOWN_HOSTS" >> ~/.ssh/known_hosts
- chmod 644 ~/.ssh/known_hosts
script: script:
<<<<<<< Updated upstream
- rsync -rv -e 'ssh -p 18775' --delete ./public/ www-user@janw.name:/var/www/janw.name/ - rsync -rv -e 'ssh -p 18775' --delete ./public/ www-user@janw.name:/var/www/janw.name/
=======
- mc mirror --overwrite public/ target/$S3_BUCKET
>>>>>>> Stashed changes

View file

@ -11,6 +11,8 @@ You can find some of the stuff that I made on this website!
Code that I've written can be found on Code that I've written can be found on
[git.janw.name](https://git.janw.name/jw) or on GitHub and GitLab linked below. [git.janw.name](https://git.janw.name/jw) or on GitHub and GitLab linked below.
You can subscribe to the [rss feed](/index.xml) if you want.
## Contact ## Contact
* Mail: `echo go@dkhw.hkgo | tr q-za-p q-zg-pa-f` * Mail: `echo go@dkhw.hkgo | tr q-za-p q-zg-pa-f`

113
content/posts/6-caddy-s3.md Normal file
View file

@ -0,0 +1,113 @@
---
title: "Hosting a static site from S3 via Caddy"
date: 2024-08-06T16:30:26+02:00
---
I had setup [MinIO](https://min.io/) as a way to self-host S3 buckets for an
unrelated project. As a way to force myself to test that setup regularly (and
to make my VPS setup more enterprise-y), I opted to host my site(s) from there.
Previously I just had the files on an XFS filesystem like a caveman and had
[Caddy](https://caddyserver.com/) serve them.
Using Caddy is really nice with its automatic SSL, sane config files and
whatnot. Thus I wanted to keep using it. So instead of simply serving static
files via the `file_server` directive, it now reverse proxies public S3 buckets
served by MinIO. This, at first, seem pretty straightforward. A basic MinIO
provides public buckets via an URL like `minio.host/$BUCKET/$OBJECT`.
`$OBJECT` can, of course, be an identifier that resembles a directory
structure. So the initial hunch was to simply configure something like this in
the _Caddyfile_:
```
rewrite * /$BUCKETNAME{uri}
reverse_proxy minio:9000
```
This works. Somewhat. Obviously it doesn't serve `index.html` when the request
points towards a directory. This is bad. All routes on this damn page rely on
this to work... Actually, its even worse. By default MinIO serves a listing of
all files in the bucket if you request "`/`". So all subdirectories are just
broken, because the object does not actually exist and the root of the page
is an ugly XML listing of all files. Not good.
To prevent MinIO from proving a file listing for publicly readable buckets,
simply remove the following _Actions_ from the access policy: `s3:ListBucket`,
`s3:ListBucketMultipartUploads`. Or vice versa, you only want the permissions
`s3:GetObject` and `s3:GetBucketLocation`.
Not, MinIO will return 403 when trying to access "`/`" and 404 when trying to
access a directory. We'll let _Caddy_ handle both errors by simply trying the
same route again, with `/index.html` appended to it.
```
rewrite * /$BUCKETNAME{uri}
reverse_proxy minio:9000 {
@error status 403 404
handle_response @error {
rewrite * {uri}/index.html
reverse_proxy minio:9000 {
@nestedError status 404
handle_response @nestedError {
respond "not found" 404
}
}
}
}
```
This retries the request when the first one returns 403 and 404. Only if the
second attempt also returns 404, we present "not found" to the enduser.
So. Done?
Not quite... In S3 one just _pretends_ that object names are fully qualified
paths. Right now, we always append `/index.html` to the request. This works
fine for `https://janw.name/blog` but falls apart if the request URL is
`https://janw.name/blog/`. Thats because the seconds one ends up as a request
for the object `blog//index.html`, which does not exist. Only `blog/index.html`
exists. We'll need to trim the trailing slash if it is present in the request.
This can be done by appending the following to the configuration:
```
@pathWithSlash path_regexp dir (.+)/$
handle @pathWithSlash {
rewrite @pathWithSlash {re.dir.1}
}
```
We can then wrap the whole thing in a nice template like so:
```
(s3page) {
@pathWithSlash path_regexp dir (.+)/$
handle @pathWithSlash {
rewrite @pathWithSlash {re.dir.1}
}
rewrite * /{args[0]}{uri}
reverse_proxy minio:9000 {
@error status 403 404
handle_response @error {
rewrite * {uri}/index.html
reverse_proxy minio:9000 {
@nestedError status 404
handle_response @nestedError {
respond "not found" 404
}
}
}
}
}
```
And then use the template like so:
```
janw.name {
import s3page "janw.name"
}
```
In my case I simply hardcoded the MinIO on my internal network into the
template (`minio:9000`). But this could be made configurable like the bucket
name if required.

View file

@ -2,3 +2,4 @@
title: "Blog" title: "Blog"
date: 2022-10-01T18:55:23Z date: 2022-10-01T18:55:23Z
--- ---

View file

@ -8,4 +8,5 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes"> <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<link rel="stylesheet" type="text/css" href="/css/style.css"> <link rel="stylesheet" type="text/css" href="/css/style.css">
<link href="data:," rel="icon"> <link href="data:," rel="icon">
<link rel="alternate" type="application/rss+xml" title="janw.name Feed" href="/index.xml">
</head> </head>

View file

@ -208,3 +208,6 @@ a {
gap: 0.5rem; gap: 0.5rem;
} }
.centered {
margin: auto;
}