Alaska Software Inc. - Anatomy of a disaster ( at Alaska Software )

	Author	Topic: Anatomy of a disaster ( at Alaska Software )
	Steffen F. Pirsig	Anatomy of a disaster ( at Alaska Software ) on Tue, 10 May 2011 16:14:56 +0200 Hi, here we go with some type of technical write up what really happend at Alaska Software last week... The disaster prelude, part #1:It all started to develop approximately 6 months ago, when some of our external servers moved from one data center to another data center. As a side effect of that move, which by itself went perfectly, our primary and secondary DNS servers were "replaced" by a single physical machine. We know this is not a good idea, but for whatever reason that little detail went unnoticed by the admin when the move was made. Of course, everything worked fine, so nobody cared about it. Intermezzo: To get a better understanding how the previous is related to the disaster which happened last week, a little look into the IT infrastructure of Alaska Software is required. First of all, there are 3 external server systems at a data center. Then there are 4 application servers hosted inside our intranet. These systems communicate using MessageQueues which are used to handle table replication, webservices over http using rest and of course, the Web Application Adaptors. This system works very solid, and any part of this infrastructure can normally go down without hampering the other parts because of many-to-many table replication between dbf tables and sql servers. But as always, this system has one single point of failure, and that's the DNS. If the DNS fails to work, none of the server systems or MessageQueue transport agents are able to connect to any other party. Furthermore, none of our customers can use any of the public Internet services of Alaska Software, such as www, email or whatever. So DNS needs to work always. That's why there is a primary and secondary DNS server! The real disaster, part #2: Having said that, the disaster started last week with a distributed brute force attack against the external server hosting the primary and secondary DNS. Again, primary and secondary DNS were on the same physical maschine by accident. This brute force attack aimed at spying out valid account names and passwords on this maschine. But this attack also brought down the DNS server - which of course is not a big issue, as the secondary DNS normally takes over if the primary DNS times out. Unfortunately, primary and secondary were located on the same maschine, leading to the nice effect that primary DNS timed out, and secondary did, too! In addition, the massive brute force rate led to a complete malfunction of the Windows authentification system after some time. This brought on the even nicer effect that the maschine was no longer remote administratable. Only a hard reboot brougt it back for some minutes, until the brute force attack brought it down again. Lesson learned: The primary and secondary DNS are now on two physical maschines, located at two different sites. And a tertiary DNS is planned Hope you can draw something useful from it... regards Steffen F. Pirsig Alaska Software Inc.
	Maurizio	Re: Anatomy of a disaster ( at Alaska Software ) on Sun, 15 May 2011 11:52:03 +0200 Suggest : put a programmable firewall between internet and your server(s) could be a good idea. You could cut all ip but yours to administer your server, then set up a selective access list and add attacker ip one by one using a script (detecting ip by repetitive requests in a unit of time). Then you could reopen your server to the cloud and let the script take care of bad boys. Unfortunatly, you should pay anyway for bandwith wasted by attak... Hi. Maurizio > Hi, > > here we go with some type of technical write up what really > happend at Alaska Software last week... > The disaster prelude, part #1:It all started to develop > approximately 6 months ago, when some of our external servers > moved from one data center to another data center. As a side > effect of that move, which by itself went perfectly, our primary > and secondary DNS servers were "replaced" by a single physical > machine. We know this is not a good idea, but for whatever reason > that little detail went unnoticed by the admin when the move was > made. Of course, everything worked fine, so nobody cared about it. > > Intermezzo: To get a better understanding how the previous is > related to the disaster which happened last week, a little look > into the IT infrastructure of Alaska Software is required. First > of all, there are 3 external server systems at a data center. Then > there are 4 application servers hosted inside our intranet. These > systems communicate using MessageQueues which are used to handle > table replication, webservices over http using rest and of course, > the Web Application Adaptors. This system works very solid, and > any part of this infrastructure can normally go down without > hampering the other parts because of many-to-many table > replication between dbf tables and sql servers. But as always, > this system has one single point of failure, and that's the DNS. > If the DNS fails to work, none of the server systems or > MessageQueue transport agents are able to connect to any other > party. Furthermore, none of our customers can use any of the > public Internet services of Alaska Software, such as www, email or > whatever. So DNS needs to work always. That's why there is a > primary and secondary DNS server! > > The real disaster, part #2: Having said that, the disaster started > last week with a distributed brute force attack against the > external server hosting the primary and secondary DNS. Again, > primary and secondary DNS were on the same physical maschine by > accident. This brute force attack aimed at spying out valid > account names and passwords on this maschine. But this attack also > brought down the DNS server - which of course is not a big issue, > as the secondary DNS normally takes over if the primary DNS times > out. Unfortunately, primary and secondary were located on the same > maschine, leading to the nice effect that primary DNS timed out, > and secondary did, too! In addition, the massive brute force rate > led to a complete malfunction of the Windows authentification > system after some time. This brought on the even nicer effect that > the maschine was no longer remote administratable. Only a hard > reboot brougt it back for some minutes, until the brute force > attack brought it down again. > > Lesson learned: The primary and secondary DNS are now on two > physical maschines, located at two different sites. And a tertiary > DNS is planned > > Hope you can draw something useful from it... > > regards > Steffen F. Pirsig > Alaska Software Inc. > > > >

Author

Topic: Anatomy of a disaster ( at Alaska Software )

Anatomy of a disaster ( at Alaska Software )
on Tue, 10 May 2011 16:14:56 +0200

Hi,

here we go with some type of technical write up what really happend
at Alaska Software last week...
The disaster prelude, part #1:It all started to develop approximately 6 
months ago, when some of our external servers moved from one data center to 
another data center. As a side effect of that move, which by itself went 
perfectly, our primary and secondary DNS servers were "replaced" by a single 
physical machine. We know this is not a good idea, but for whatever reason 
that little detail went unnoticed by the admin when the move was made. Of 
course, everything worked fine, so nobody cared about it.

Intermezzo: To get a better understanding how the previous is related to the 
disaster which happened last week, a little look into the IT infrastructure 
of Alaska Software is required. First of all, there are 3 external server 
systems at a data center. Then there are 4 application servers hosted inside 
our intranet. These systems communicate using MessageQueues which are used 
to handle table replication, webservices over http using rest and of course, 
the Web Application Adaptors. This system works very solid, and any part of 
this infrastructure can normally go down without hampering the other parts 
because of many-to-many table replication between dbf tables and sql 
servers. But as always, this system has one single point of failure, and 
that's the DNS. If the DNS fails to work, none of the server systems or 
MessageQueue transport agents are able to connect to any other party. 
Furthermore, none of our customers can use any of the public Internet 
services of Alaska Software, such as www, email or whatever. So DNS needs to 
work always. That's why there is a primary and secondary DNS server!

The real disaster, part #2: Having said that, the disaster started last week 
with a distributed brute force attack against the external server hosting 
the primary and secondary DNS. Again, primary and secondary DNS were on the 
same physical maschine by accident. This brute force attack aimed at spying 
out valid account names and passwords on this maschine. But this attack also 
brought down the DNS server - which of course is not a big issue, as the 
secondary DNS normally takes over if the primary DNS times out. 
Unfortunately, primary and secondary were located on the same maschine, 
leading to the nice effect that primary DNS timed out, and secondary did, 
too! In addition, the massive brute force rate led to a complete malfunction 
of the Windows authentification system after some time. This brought on the 
even nicer effect that the maschine was no longer remote administratable. 
Only a hard reboot brougt it back for some minutes, until the brute force 
attack brought it down again.

Lesson learned: The primary and secondary DNS are now on two physical 
maschines, located at two different sites. And a tertiary DNS is planned 

Hope you can draw something useful from it...

regards
Steffen F. Pirsig
Alaska Software Inc.

Maurizio

Re: Anatomy of a disaster ( at Alaska Software )
on Sun, 15 May 2011 11:52:03 +0200

Suggest :
put a programmable firewall between internet and your server(s) could 
be a good idea.

You could cut all ip but yours to administer your server, then set up 
a selective access list and add attacker ip one by one using a script 
(detecting ip by repetitive requests in a unit of time).

Then you could reopen your server to the cloud and let the script take 
care of bad boys.

Unfortunatly, you should pay anyway for bandwith wasted by attak...  


Hi.
Maurizio


> Hi,
> 
> here we go with some type of technical write up what really
> happend at Alaska Software last week...
> The disaster prelude, part #1:It all started to develop
> approximately 6 months ago, when some of our external servers
> moved from one data center to another data center. As a side
> effect of that move, which by itself went perfectly, our primary
> and secondary DNS servers were "replaced" by a single physical
> machine. We know this is not a good idea, but for whatever reason 
> that little detail went unnoticed by the admin when the move was
> made. Of course, everything worked fine, so nobody cared about it.
> 
> Intermezzo: To get a better understanding how the previous is
> related to the disaster which happened last week, a little look
> into the IT infrastructure of Alaska Software is required. First
> of all, there are 3 external server systems at a data center. Then
> there are 4 application servers hosted inside our intranet. These
> systems communicate using MessageQueues which are used to handle
> table replication, webservices over http using rest and of course,
> the Web Application Adaptors. This system works very solid, and
> any part of this infrastructure can normally go down without
> hampering the other parts because of many-to-many table
> replication between dbf tables and sql servers. But as always,
> this system has one single point of failure, and that's the DNS.
> If the DNS fails to work, none of the server systems or 
> MessageQueue transport agents are able to connect to any other
> party. Furthermore, none of our customers can use any of the
> public Internet services of Alaska Software, such as www, email or
> whatever. So DNS needs to work always. That's why there is a
> primary and secondary DNS server! 
> 
> The real disaster, part #2: Having said that, the disaster started
> last week with a distributed brute force attack against the
> external server hosting the primary and secondary DNS. Again,
> primary and secondary DNS were on the same physical maschine by
> accident. This brute force attack aimed at spying out valid
> account names and passwords on this maschine. But this attack also
> brought down the DNS server - which of course is not a big issue,
> as the secondary DNS normally takes over if the primary DNS times
> out. Unfortunately, primary and secondary were located on the same
> maschine, leading to the nice effect that primary DNS timed out,
> and secondary did, too! In addition, the massive brute force rate
> led to a complete malfunction of the Windows authentification
> system after some time. This brought on the even nicer effect that
> the maschine was no longer remote administratable. Only a hard
> reboot brougt it back for some minutes, until the brute force 
> attack brought it down again. 
> 
> Lesson learned: The primary and secondary DNS are now on two
> physical maschines, located at two different sites. And a tertiary
> DNS is planned  
> 
> Hope you can draw something useful from it...
> 
> regards
> Steffen F. Pirsig
> Alaska Software Inc.
> 
> 
> 
>