An Analysis of high CPU and RAM usage caused by Go Program
1. Problem description#
Considering the limitation of the RAM on my VPS, when user attempting to upload a large file, I want to handle the file in chunks, so that I can read the file in small pieces and write them to the disk. And this process is called streaming.
The code snippet is like this:
func (s *server) handleUpload(w http.ResponseWriter, r *http.Request, currentDir string) (errs []uploadError) {
// limit the size of incoming request bodies.
r.Body = &LimitedReader{r: r.Body, n: int64(s.maxFileSize * 1024 * 1024)}
reader, err := r.MultipartReader()
if err != nil {...}
for {
// reader.NextPart() will close the previous part automatically.
part, err := reader.NextPart()
if err != nil {...}
// not a file, move to the next part.
if part.FileName() == "" {
continue
}
filename := getAvailableName(currentDir, part.FileName())
dstPath := filepath.Join(currentDir, filename)
dst, err := os.Create(dstPath)
if err != nil {...}
// io.Copy() will stream the file to dst, multipart.Part is an io.Reader.
_, err = io.Copy(dst, part)
if err != nil {...}
if err = dst.Close(); err != nil { ... }
}
return
}
But when I try to upload a file of 1.6GB, it takes a long time to finish (about 2 minutes), I thought it may caused by the CPU speed of my VPS or the I/O speed of the disk or the network speed.
But at the browser side, the uploading progress bar is always at 100%, and the browser is waiting for the response from the server, which indicates that the uploading process is finished, but the server is still processing the file.
I use htop
to check the CPU usage of my VPS, it’s around 20% when uploading the file (consists of network IO and disk IO), which indicates that the CPU is not the bottleneck.
Then I use dd
to test the I/O speed of my disk, it’s around 488MB/s, which is not bad.
Then why the server is still processing the file so long?
Besides, when my server is processing the file, if I refresh the page, the RAM and cpu usage of my VPS will increase to 100%, and the server will get killed by the OS. This is super weird.
2. Solution for high CPU usage#
The error output (call stack) looks like this:
goroutine 12 [running]:
runtime.systemstack_switch()
/Users/David/sdk/go1.20.4/src/runtime/asm_amd64.s:463 fp=0xc00004b7b8 sp=0xc00004b7b0 pc=0x4643e0
runtime.(*mheap).alloc(0x1c036000?, 0xe01b?, 0x78?)
/Users/David/sdk/go1.20.4/src/runtime/mheap.go:955 +0x65 fp=0xc00004b800 sp=0xc00004b7b8 pc=0x427025
runtime.(*mcache).allocLarge(0x47047e?, 0x1c036000, 0x0)
/Users/David/sdk/go1.20.4/src/runtime/mcache.go:234 +0x85 fp=0xc00004b848 sp=0xc00004b800 pc=0x4167e5
runtime.mallocgc(0x1c036000, 0x757080, 0x1)
/Users/David/sdk/go1.20.4/src/runtime/malloc.go:1053 +0x4fe fp=0xc00004b8b0 sp=0xc00004b848 pc=0x40d6de
runtime.growslice(0xc05ca3c000, 0x15?, 0xc00004b9c0?, 0x1?, 0x757080)
/Users/David/sdk/go1.20.4/src/runtime/slice.go:274 +0x4e9 fp=0xc00004b910 sp=0xc00004b8b0 pc=0x44c3a9
main.(*server).handleUpload(0xc00005c240, {0x0?, 0x820a00?}, 0xc000138600, {0xc000098008, 0x8})
/Users/David/Codes/GoLand/file-server/handleFile.go:87 +0x2ad fp=0xc00004ba00 sp=0xc00004b910 pc=0x70290d
main.(*server).taskDelegation(0xc00005c240, {0x823000, 0xc000012140}, 0xc000138600)
/Users/David/Codes/GoLand/file-server/route.go:25 +0x252 fp=0xc00004bbc0 sp=0xc00004ba00 pc=0x704652
main.(*server).ServeHTTP(0xc00005c240, {0x823000, 0xc000012140}, 0xc000138600)
/Users/David/Codes/GoLand/file-server/server.go:144 +0x4eb fp=0xc00004bca0 sp=0xc00004bbc0 pc=0x70608b
net/http.serverHandler.ServeHTTP({0xc00001a0a8?}, {0x823000, 0xc000012140}, 0xc000138600)
/Users/David/sdk/go1.20.4/src/net/http/server.go:2936 +0x316 fp=0xc00004bd50 sp=0xc00004bca0 pc=0x6a2ff6
net/http.initALPNRequest.ServeHTTP({{0x8232b8?, 0xc0001588a0?}, 0xc00010e700?, {0xc000146000?}}, {0x823000, 0xc000012140}, 0xc000138600)
/Users/David/sdk/go1.20.4/src/net/http/server.go:3545 +0x245 fp=0xc00004bed8 sp=0xc00004bd50 pc=0x6a4ee5
net/http.(*initALPNRequest).ServeHTTP(0x40b5b6?, {0x823000?, 0xc000012140?}, 0x0?)
<autogenerated>:1 +0x55 fp=0xc00004bf20 sp=0xc00004bed8 pc=0x6b3fb5
net/http.Handler.ServeHTTP-fm({0x823000?, 0xc000012140?}, 0xc00001c2e8?)
<autogenerated>:1 +0x42 fp=0xc00004bf50 sp=0xc00004bf20 pc=0x6b4dc2
It seems that the server is trying to allocate memory to a slice, and the size of the slice is too large, so the server is killed by the OS. Checked the code snippet I provided before, there is:
for {
// reader.NextPart() will close the previous part automatically.
part, err := reader.NextPart()
if err != nil {
if err == io.EOF { // finish reading all parts, exit the loop
break
}
errs = append(errs, uploadError{Message: fmt.Sprintf("reader.NextPart(): %v", err)})
continue
}
}
What if the error is not io.EOF
, but other errors? Then there will be a infinite loop, this is the reason why the CPU usage is high. What I do is to limit the size of the error message:
if err != nil {
if err == io.EOF { // finish reading all parts, exit the loop
break
}
errs = append(errs, uploadError{Message: fmt.Sprintf("reader.NextPart(): %v", err)})
// too many errors, stop uploading, you must limit the number of errors in case of infinite loop.
if len(errs) >= 10 {
errs = append(errs, uploadError{Message: "Maximum error limit reached"})
return
}
continue
}
3. Solution for slow uploading speed#
I ask this question in Go community that I joined, and I got a suggestion:
Form Upload(multipart/form-data
) is not good for large files, when uploading a large file, we usually use Binary Stream Uploads(application/octet-stream
) instead, use r.Body
directly to read the file content, and write the content to the disk this is real streaming.
Actually, with multipart/form-data
, the file content is read by block, not the real streaming, you can check the code I provided before: part, err := reader.NextPart()
.
So use PUT
with application/octet-stream
instead of multipart/form-data
to upload large files.
But after I change the code to use POST
with application/octet-stream
, the uploading speed is still slow.
Then I monitor the disk I/O speed and network speed and found the reason, the network speed is around 7MB/s, which is the bottleneck.
dstat -d # check the disk speed at real time
dsat -n # check the network speed at real time
But why the speedtest-cli shows the network speed is fast but the real speed is slow?
Download: 298.52 Mbit/s
Testing upload speed......................................................................................................
Upload: 192.82 Mbit/s