502 from flipt in k8s

Hello team,

My question is somewhat abstract because, to be honest, I don’t know where to start. Currently, we have Flipt running inside Kubernetes, exposed through an Nginx Ingress. Throughout its entire lifetime, we’ve been intermittently receiving 502 errors, approximately 0.000854 requests per second per day. Recently, we updated it to the latest version, but nothing has changed.

I see no errors from the pods or the database (we use a separate PostgreSQL instance). Is there perhaps a known issue with Kubernetes and Flipt, or could you help me identify the cause of these 502 errors?

Thank you in advance for your assistance, and I must say, it’s a great product!

Hi @Mmladshev !

Sorry you are experiencing issues with Flipt and 502s.

We have not heard of any issues running Flipt in k8s. To help us debug the issue, have you tried turning on DEBUG logs?

You can do this for both the standard application logs as well as GRPC itself via our config file (or environment variables)

Maybe setting both log levels to debug temporarily will help us find the root cause?

sure
cat values.yaml | grep level

  level: debug
  grpc_level: debug

cat flipt-58677485dd-kjvcm | grep -i error
cat flipt-58677485dd-wr5cn | grep -i error

cat flipt-58677485dd-kjvcm | grep 502

2024-05-30T15:53:43Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"4502edd3-6704-4e26-9ffd-652e645c5013" namespace_key:"dns-prod" flag_key:"enable-decider" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:53:43Z DEBUG boolean {“server”: “grpc”, “response”: “enabled:true reason:DEFAULT_EVALUATION_REASON request_id:"4502edd3-6704-4e26-9ffd-652e645c5013" flag_key:"enable-decider"”}
2024-05-30T15:53:43Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"61828b85-a502-42bf-b481-56982ac75172" namespace_key:"dns-prod" flag_key:"enable-decider" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:53:43Z DEBUG boolean {“server”: “grpc”, “response”: “enabled:true reason:DEFAULT_EVALUATION_REASON request_id:"61828b85-a502-42bf-b481-56982ac75172" flag_key:"enable-decider"”}
2024-05-30T15:53:44Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"340f150c-dea0-44fa-abae-b16df793d502" namespace_key:"dns-prod" flag_key:"enable-decider" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:53:44Z DEBUG boolean {“server”: “grpc”, “response”: “enabled:true reason:DEFAULT_EVALUATION_REASON request_id:"340f150c-dea0-44fa-abae-b16df793d502" flag_key:"enable-decider"”}
2024-05-30T15:54:48Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"5026dc5d-8528-4449-8cfe-4ccdcc5ae4f4" namespace_key:"dns-preprod" flag_key:"enable-event-publisher" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:54:48Z DEBUG boolean {“server”: “grpc”, “response”: “reason:DEFAULT_EVALUATION_REASON request_id:"5026dc5d-8528-4449-8cfe-4ccdcc5ae4f4" flag_key:"enable-event-publisher"”}
2024-05-30T15:55:01Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"b8650276-a72a-4552-afeb-2bfc94769037" namespace_key:"dns-prod" flag_key:"enable-decider" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:55:01Z DEBUG boolean {“server”: “grpc”, “response”: “enabled:true reason:DEFAULT_EVALUATION_REASON request_id:"b8650276-a72a-4552-afeb-2bfc94769037" flag_key:"enable-decider"”}
2024-05-30T15:55:13Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"384f51b9-234d-4e44-9502-3ba6f80aca19" namespace_key:"dns-prod" flag_key:"enable-decider" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:55:13Z DEBUG boolean {“server”: “grpc”, “response”: “enabled:true reason:DEFAULT_EVALUATION_REASON request_id:"384f51b9-234d-4e44-9502-3ba6f80aca19" flag_key:"enable-decider"”}

cat flipt-58677485dd-wr5cn | grep 502

2024-05-30T15:53:10Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"2502e79a-6343-45f3-9404-0462e3893d0b" namespace_key:"dns-prod" flag_key:"enable-decider" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:53:10Z DEBUG boolean {“server”: “grpc”, “response”: “enabled:true reason:DEFAULT_EVALUATION_REASON request_id:"2502e79a-6343-45f3-9404-0462e3893d0b" flag_key:"enable-decider"”}
2024-05-30T15:53:32Z INFO finished unary call with code OK {“server”: “grpc”, “grpc.start_time”: “2024-05-30T15:53:32Z”, “system”: “grpc”, “span.kind”: “server”, “grpc.service”: “flipt.evaluation.EvaluationService”, “grpc.method”: “Boolean”, “peer.address”: “127.0.0.1:60872”, “grpc.code”: “OK”, “grpc.time_ms”: 2.502}
2024-05-30T15:53:58Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"128fd5af-a0b6-4502-9c89-09df9bd28eaf" namespace_key:"dns-prod" flag_key:"enable-decider" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:53:58Z DEBUG boolean {“server”: “grpc”, “response”: “enabled:true reason:DEFAULT_EVALUATION_REASON request_id:"128fd5af-a0b6-4502-9c89-09df9bd28eaf" flag_key:"enable-decider"”}
2024-05-30T15:54:02Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"afcf9ec6-502c-46ae-86d0-79e992c76265" namespace_key:"dns-preprod" flag_key:"enable-event-publisher" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:54:02Z DEBUG boolean {“server”: “grpc”, “response”: “reason:DEFAULT_EVALUATION_REASON request_id:"afcf9ec6-502c-46ae-86d0-79e992c76265" flag_key:"enable-event-publisher"”}
2024-05-30T15:54:58Z DEBUG boolean {“server”: “grpc”, “request”: “request_id:"7006435c-4be3-453c-9795-5027348d2f89" namespace_key:"dns-prod" flag_key:"enable-decider" entity_id:"all" context:{key:"targetingKey" value:"all"}”}
2024-05-30T15:54:58Z DEBUG boolean {“server”: “grpc”, “response”: “enabled:true reason:DEFAULT_EVALUATION_REASON request_id:"7006435c-4be3-453c-9795-5027348d2f89" flag_key:"enable-decider"”}

but in the same time i see on ingress level

[30/May/2024:15:51:37 +0000] “POST /evaluate/v1/boolean HTTP/1.0” 502 150 “-” “failover-manager/0.0.0-dev/20240513T092624Z-83f379e8adef9e9081c2d707ac0fa28469830c5b-go1.20.7” 469 0.001 flipt-ingress [some-sensitive-info] 10.244.10.189:8080 0.000 502 4482f150afb69bf01677f4fa9c876097 “00-ee462832ad18ad289a330e6418cfa006-260c90b37d8332e1-01”

So… I cannot catch the error and do not understand at what level it is occurring.

We have many Ingresses, all configured with the same template/pattern, yet we only see 502 errors with this particular application. This leads me to think the issue might be at the application level, but I am not certain.