Hi,
We are using round-robin DNS to distribute requests to three servers
all running identically configured nginx. Connections then go upstream
to HAProxy and then to our Rails app.
About two weeks ago, users began to experience intermittent SSL
handshake errors. Users reported that these appeared as
“ssl_error_no_cypher_overlap” in the browser. Most of our reports have
come from Firefox users, although we have seen reports from Safari and
stock Android browser users as well. In our nginx error logs, we began
to see consistent errors across all three servers. They started at
around the same time and no recent modifications were made to hardware
or software:
…
2015/01/13 12:22:59 [crit] 11871#0: 140260577 SSL_do_handshake()
failed (SSL: error:1408A0D7:SSL
routines:SSL3_GET_CLIENT_HELLO:required cipher missing) while SSL
handshaking, client: ..., server: 0.0.0.0:443
2015/01/13 12:23:09 [crit] 11874#0: 140266246 SSL_do_handshake()
failed (SSL: error:1408A0D7:SSL
routines:SSL3_GET_CLIENT_HELLO:required cipher missing) while SSL
handshaking, client: ..., server: 0.0.0.0:443
2015/01/13 12:23:54 [crit] 11862#0: 140293705 SSL_do_handshake()
failed (SSL: error:1408A0D7:SSL
routines:SSL3_GET_CLIENT_HELLO:required cipher missing) while SSL
handshaking, client: ..., server: 0.0.0.0:443
2015/01/13 12:23:54 [crit] 11862#0: 140293708 SSL_do_handshake()
failed (SSL: error:1408A0D7:SSL
routines:SSL3_GET_CLIENT_HELLO:required cipher missing) while SSL
handshaking, client: ..., server: 0.0.0.0:443
2015/01/13 12:25:18 [crit] 11870#0: 140342155 SSL_do_handshake()
failed (SSL: error:1408A0D7:SSL
routines:SSL3_GET_CLIENT_HELLO:required cipher missing) while SSL
handshaking, client: ...*, server: 0.0.0.0:443
…
Suspecting that this may be related to our SSL configuration in nginx
and a recent update to a major browser, I decided to get us up to
date. Previously we were on CentOS5 and could only use an older
version of OpenSSL with the latest security patches. This meant we
could only support TLSv1.0 and a few of the secure recommended
ciphers. After upgrading to CentOS6 and implementing Mozilla’s
recommended configurations for TLSv1.0, TLSv1.1, and TLSv1.2 support,
I am confident that we are following best practices for SSL browser
compatibility and security. Unfortunately this did not fix the issue.
Users began to report a new error in their browser:
“ssl_error_inappropriate_fallback_alert”, and this is currently
reflected in our nginx error logs across all three servers:
…
2015/01/31 03:24:33 [crit] 30658#0: 57298755 SSL_do_handshake()
failed (SSL: error:140A1175:SSL
routines:SSL_BYTES_TO_CIPHER_LIST:inappropriate fallback) while SSL
handshaking, client: ..., server: 0.0.0.0:443
2015/01/31 03:24:35 [crit] 30661#0: 57299105 SSL_do_handshake()
failed (SSL: error:140A1175:SSL
routines:SSL_BYTES_TO_CIPHER_LIST:inappropriate fallback) while SSL
handshaking, client: ..., server: 0.0.0.0:443
2015/01/31 03:24:41 [crit] 30657#0: 57300774 SSL_do_handshake()
failed (SSL: error:140A1175:SSL
routines:SSL_BYTES_TO_CIPHER_LIST:inappropriate fallback) while SSL
handshaking, client: ..., server: 0.0.0.0:443
2015/01/31 03:24:41 [crit] 30657#0: 57300783 SSL_do_handshake()
failed (SSL: error:140A1175:SSL
routines:SSL_BYTES_TO_CIPHER_LIST:inappropriate fallback) while SSL
handshaking, client: ..., server: 0.0.0.0:443
2015/01/31 03:24:41 [crit] 30661#0: 57300785 SSL_do_handshake()
failed (SSL: error:140A1175:SSL
routines:SSL_BYTES_TO_CIPHER_LIST:inappropriate fallback) while SSL
handshaking, client: ...*, server: 0.0.0.0:443
…
Thinking that I had ruled out a faulty SSL stack or nginx
configuration, I focused on monitoring the network connections on
these servers. ESTABLISHED connections are currently at 13k and
TIME_WAIT is at 94k on one server, if that gives any indication to the
type of connections we are dealing with. The other two have very
similar stats. This is typical for peak hours of traffic. I tried
tuning kernel params: lowering tcp_fin_timeout, increasing
tcp_max_syn_backlog, increasing the range of ip_local_port_range,
turning on tcp_tw_reuse, and other popular tuning practices. Nothing
has helped so far and more users continue to contact us about issues
using our site.
I’ve exhausted my ideas and I’m not quite sure what’s gone wrong. I
would be extremely appreciative of any guidance list members could
provide. Below are more technical details about our installation and
configuration of nginx.
nginx -V output:
nginx version: nginx/1.6.2
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx
–conf-path=/etc/nginx/nginx.conf
–error-log-path=/var/log/nginx/error.log
–http-log-path=/var/log/nginx/access.log
–pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock
–http-client-body-temp-path=/var/cache/nginx/client_temp
–http-proxy-temp-path=/var/cache/nginx/proxy_temp
–http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp
–http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp
–http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx
–group=nginx --with-http_ssl_module --with-http_realip_module
–with-http_addition_module --with-http_sub_module
–with-http_dav_module --with-http_flv_module --with-http_mp4_module
–with-http_gunzip_module --with-http_gzip_static_module
–with-http_random_index_module --with-http_secure_link_module
–with-http_stub_status_module --with-http_auth_request_module
–with-mail --with-mail_ssl_module --with-file-aio --with-ipv6
–with-http_spdy_module --with-cc-opt=‘-O2 -g -pipe
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
–param=ssp-buffer-size=4 -m64 -mtune=generic’
nginx config files:
— /etc/nginx/nginx.conf —
user nginx;
worker_processes 12;
error_log /var/log/nginx/error.log;
pid /var/run/nginx.pid;
events {
worker_connections 50000;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format with_cookie '$remote_addr - $remote_user [$time_local] ’
'“$request” $status $body_bytes_sent ’
‘“$http_referer” “$http_user_agent”
“$cookie_FL”’;
access_log /var/log/nginx/access.log;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
gzip on;
gzip_http_version 1.0;
gzip_comp_level 2;
gzip_proxied any;
gzip_types text/plain text/html text/css application/x-javascript
text/xml application/xml application/xml+rss text/javascript
application/json;
gzip_vary on;
server_names_hash_bucket_size 64;
set_real_ip_from ...;
real_ip_header X-Forwarded-For;
include /etc/nginx/upstreams.conf;
include /etc/nginx/sites-enabled/*;
}
— /etc/nginx/sites-enabled/fl-ssl.conf —
server {
root /var/www/fl/current/public;
listen 443;
ssl on;
ssl_certificate /etc/nginx/ssl/wildcard.fl.pem;
ssl_certificate_key /etc/nginx/ssl/wildcard.fl.key;
ssl_session_timeout 5m;
ssl_session_cache shared:SSL:50m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers
‘ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA’;
ssl_prefer_server_ciphers on;
server_name **********.com;
access_log /var/log/nginx/fl.ssl.access.log with_cookie;
client_max_body_size 400M;
index index.html index.htm;
if (-f $document_root/system/maintenance.html) {
return 503;
}
Google Analytics
if ($request_filename ~* ga.js$) {
rewrite .* http://www.google-analytics.com/ga.js permanent;
break;
}
if ($request_filename ~* /adgear.js/current/adgear_standard.js) {
rewrite .* http://**********.com/adgear/adgear_standard.js
permanent;
break;
}
if ($request_filename ~* /adgear.js/current/adgear.js) {
rewrite .* http://**********.com/adgear/adgear_standard.js
permanent;
break;
}
if ($request_filename ~* __utm.gif$) {
rewrite .* http://www.google-analytics.com/__utm.gif permanent;
break;
}
if ($host ~* “www”) {
rewrite ^(.)$ http://********.com$1 permanent;
break;
}
location / {
location ~* .(eot|ttf|woff)$ {
add_header Access-Control-Allow-Origin *;
}
if ($request_uri ~* ".(ico|css|js|gif|jpe?g|png)\?[0-9]+$") {
expires max;
break;
}
# needed to forward user's IP address to rails
proxy_set_header X-Real-IP $remote_addr;
# needed for HTTPS
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-FORWARDED_PROTO https;
proxy_redirect off;
proxy_max_temp_file_size 0;
if ($request_uri ~* /polling) {
proxy_pass http://ssl_polling_upstream;
break;
}
if ($request_uri = /upload) {
proxy_pass http://rest_stop_upstream;
break;
}
if ($request_uri = /crossdomain.xml) {
proxy_pass http://rest_stop_upstream;
break;
}
if (-f $request_filename/index.html) {
rewrite (.*) $1/index.html break;
}
# Rails 3 is for old testing stuff... We don't need this anymore
#if ($http_cookie ~ "rails3=true") {
# set $request_type '3';
#}
if ($request_uri ~* /polling) {
set $request_type '${request_type}P';
}
if ($request_type = '3P') {
proxy_pass http://rails3_upstream;
break;
}
if ($request_uri ~* /polling) {
set $request_type '${request_type}P';
}
if ($request_type = '3P') {
proxy_pass http://rails3_upstream;
break;
}
if ($request_type = 'P') {
proxy_pass http://ssl_polling_upstream;
break;
}
if (!-f $request_filename) {
set $request_type '${request_type}D';
}
if ($request_type = 'D') {
proxy_pass http://ssl_fl_upstream;
break;
}
if ($request_type = '3D') {
proxy_pass http://rails3_upstream;
break;
}
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}