Recently in network Category

IPv6 Adoption?

An illustration of an example IPv6 address

Image via Wikipedia

Leo Vegoda posted an interesting article today on CircleID about ipv6 deployment and adoption.

While Leo admits that measuring ipv6 isn't an exact science he was able to identify that the African region (AfriNIC) is announcing more ipv6 space than any of the other regions, including Europe.

So what about Ireland?

Sixxs maintain statistics of networks announcing IPv6 addresses globally on a per country basis.

The table for Ireland is quite interesting.

Of the 24 Irish networks with an IPv6 allocation, only 13 are active and of those only 10 are being announced globally (if my interpretation of the table is correct).

The Irish government has an allocation, but doesn't seem to have announced it ever!

Of course HEAnet is making use of theirs, but they're not a commercial ISP.

The problem for widespread adoption of IPv6 is going to lie in encouraging commercial ISPs to make IPv6 available.

When this is likely to happen, however, is anyone's guess.





Reblog this post [with Zemanta]
ICANN Logo

Image via Wikipedia

As I mentioned previously, I was asked to give a presentation on IPv6, or more specifically, our experience with deploying IPv6, at the recent ICANN meeting in Paris.

Leo Vegoda has posted an excellent followup of the session on the ICANN blog, summarising the speakers' various comments and thoughts.

While we may have some experience with ipv6 the session in Paris definitely opened my eyes to a whole set of issues that we will need to address moving forward.

Zemanta Pixie

Blacknight Technical Blog Now Live

If you want to know about any service affecting maintenance, technical updates or anything else of a technical nature, we recommend that you check out our new Technical Blog.

The site is hosted outside our core network (we don't even use our own nameservers just to be 100% safe!) and is part of our backup / contingency plans for emergency situations.

While our network uptime has been and hopefully will continue to be exemplary there's no reason to be lazy. We need to make sure that we have a system in place in case there is an issue NOT after the issue arises.

You can subscribe to the site's RSS feed OR to the email alerts.

Your choice :)

You can sign up for the email alerts by filling out the form below:


Enter your Email






Preview | Powered by FeedBlitz


Blacknight On WebmasterRadio.fm

retro radio


Journalists call from time to time asking me to talk about various internet related topics. Most of the time the publications or shows are "general interest", so you can only talk about very general things.

Last night, however, was quite different, as I was one of the guests on "Domain Masters" which is broadcast and streamed weekly at 7pm EST (11pm in Ireland, midnight CET)

The show's host last night was my good friend Jothan Frakes who is one of the domain name industry's gurus.

Although I was very nervous (which probably showed!) we had a nice chat about Blacknight, domains and the internet industry.

If anyone wants to hear the show there should be an mp3 version available on the WebmasterRadio site at some time over the next couple of days.

UPDATE: The Mp3 from last night is now available on the site http://www.webmasterradio.fm/Internet-Marketing/Domain-Masters/Geo-Domain-Expo-and-BlackKnight.htm

UPDATE 2: Of course if I provided proper hyperlinks people might be actually able to use them!
So here you go: Show details including podcast

INEX connectivity Upgrade

| 2 Comments

When: INEX LAN#1 connection being upgraded @ 23:00 on Monday 10th of March.

What: We currently have 2 x 100M connections to INEX. Our LAN#1 connection carriers a lot of our INEX traffic and as such we're upgrading it to 1000M to prevent it being a bottle neck for traffic originated in Ireland. This is a simple software configuration change for the port speed on one of our routers and requires INEX operations staff to do the same on their end.

There'll be a brief hit as all our INEX peerings on LAN#1 go down and traffic re-routes over LAN#2 and transit. This should only be temporary and peerings should re-establish automatically after a few minutes.

Summary: @ 23:00 on March 10th we're upgrading our primary Connection to INEX to 1000M. Traffic reaching us via INEX peerings on LAN#1 should be re-routed via LAN#2 and transit with minimal downtime being incurred, just the time it takes for BGP to reconverge.

Update: 23:25 March 10th 2008

This upgrade went ahead without a hitch. We're now running at GE @ INEX on Lan#1. We'll upgrade Lan#2 later in the year as necessary.

Inter DataCentre connectivity testing

When: Monday 10th of March @ 22:00 hours

What: Firstly we've recently lit our own protected wavelength between DEG and InterXion. It has been in place and in testing for a few weeks now. We need to test the failover on both the long and short legs of this new connectivity and also check the failover to the backup layer 2 paths in the event of both the short and long legs getting damaged at the same time.

We don't expect any downtime during this testing as our layer 2 network normally fails over within a few milliseconds.

Secondly we're moving the InterXion firewalls to the new Distribution routers in this location. This change should take 30 seconds or so to propagate within our network as it's a logical Layer 3 change.

This will mean the firewalled network in DEG and InterXion will be seperated from each other and traffic originating and destined for each data centre doesn't need to traverse our metro network.

For complete testing we'll allocate 2 hours to perform these tests. We don't envisage anything more than a few 10-30 seconds hits on metro traffic (so won't affect everyone) and it will only cause slow loading times for some websites and not others.

Summary: Works begin @ 22:00 hours on Monday 10th and end at midnight on Monday 10th. There should only be a few short hits on our metro links as they failover while we simulate fibre cuts, switch failures, port failures etc.

Update: 23:58 March 10th:

These tests have been completed. The inter DC links have been tested in several scenarios are we're happy it's quite resilient now. We also moved the InterXion firewalls to a new distribution router pair in InterXion from the DEG routers. This took a little longer than expected, around 3 minutes and 40 seconds or so, slight OSPF glitch in the config which took a minute or two to find. All went to plan except that firewall move in the initial stages.

Unix Engineer Position Available

| 2 Comments

We're hiring staff at the moment ... I've already mentioned some of the roles we need to fill and today I've got yet another one...

This time round its for an "experienced" *nix admin position.

Basically we've got loads of servers. They need lots of TLC ...

Here are the details:

Position: Unix Engineer

The Unix Engineer has a specific area of responsibility within the systems that comprise the Blacknight core service delivery infrastructure. He/she reports to the Support Supervisor. This role may require leadership of junior engineers. Close co-operation with the core systems engineers is essential.

Responsibilities associated with this role include, but are not limited to the following:

* Ensure that all Blacknight services run optimally; continuously monitor performance against published SLA targets; consistently strive for excellence and highest system uptimes
* Assume special responsibility for Unix/Linux-based service delivery systems:
* Shared unix web servers and mail servers
* Unix/Linux-based database servers
* Support to dedicated or co-located customers on Unix/Linux systems who have bought managed services
* Advanced dedicated server development
* Unix/Linux-specific backups
* Plan and implement future developments, such as systems upgrades, using detailed plans and scheduled upgrade or implementation cycles
* Maintain/upgrade current anti-virus, backup, and other management software on Unix/Linux systems;
* Support deployment of hardware, software, service, and maintenance of all information technology assets
* Maintain full security on all systems
* Some development of in house applications may be required
* Assist in resolving systems outages out-of-hours through participation in an engineer call-out roster

Skills / Knowledge

* Deep knowledge of networking protocols (DNS/IPsec/POP3/SMTP and TCP/IP in mixed operating system environments)
* Experience managing Linux and other open-source operating systems essential, scripting ability essential, experience with commercial Unix, such as Solaris, an advantage
* Familiarity with Quagga the open-source routing suite desirable
* Software development experience in PHP and/or PERL desirable
* Working knowledge of Windows-based server operating systems
* Ability to build, run, and scale most ISP services from source
* Experience with Java, Tomcat, JSP deployments, ability to deploy and support such deployments
* Extensive knowledge of project-based software upgrade plans, ability to plan, write, communicate, and execute project and/or test plans
* Excellent understanding of firewalls and network security essential; experience with management of and planning of deployments of Cisco PIX or Fortinet firewalls essential
* Disaster recovery planning/testing
* Hands-on experience of backup software essential
* hardware diagnostic skills would be desirable, be able to trouble shoot problems with servers to determine if a fault is software of hardware based and be able to use your own initiative to contact vendors for replacements where applicable

PERSONAL ATTRIBUTES:

* Excellent communication skills
* Must be forward thinking and have good planning skills
* Excellent troubleshooting, judgment, and decision-making skills
* High level of self-motivation and proven ability for self-study and learning on the job
* Flexible and dependable, On-call work and Out of hours duties may be required from time to time (this is paid extra)
* Team player

QUALIFICATIONS/ EXPERIENCE

* 3 years experience working in a similar role

Salary

* Negotiable, between € 28,000 and €35,000 plus benefits and depending on experience

EDIT: The job is based in our offices in Carlow ...

help wanted

When I first setup Blacknight we were a very very very small operation without any staff, offices or anything else.

Nowadays we have a growing team and are constantly expanding.

Our offices are quite comfortable, though we're going to be extending them very soon.

We're currently looking for more staff to help us expand and grow and offer our clients quality service and support.

There are currently 4 positions open and it would be really nice if we could fill them.

On the technical side there are two roles.

One is technical support L1. In simple terms the L1 technical support role is the "frontline". You'd be dealing with our clients via email, phone and other methods (we're not using Livechat at present, but we may do...). It would be a junior position and in order to qualify you'd need to have a good balance of humour, patience and an interest in IT. You don't need to have a degree in IT. In fact you don't need a degree at all, though it would be helpful...

The other technical role is as "Data Centre Technician". I'll post the full job spec further down, but the basic outline is that you'd be working primarily in Dublin (Clondalkin and Park West) and liasing with clients and the rest of the team.

The other roles are on the sales team.

It's 2008. Ireland is part of the EU. It would be downright silly to not want to staff capable of communicating with our European neighbours. So we're looking for multilingual sales staff. The languages we'd be interested in would be (in no particular order): Spanish, French, German or Italian. However if you have other EU languages then do feel free to contact us.

The other sales role is "straight" sales.

Both sales roles are based in our offices in Carlow and the candidates would report directly to our sales manager.

So, without further ado, here are the details of the Data Centre Technician role:

Position: Data Centre Technician

The Data Centre Technician role would be suited to someone who has previously worked for a Data Centre/large corporate with 500+ server deployments. He/she reports to the Technical Director. This role will involve close co-operation with the deployment and sales teams regarding new/current and furture customer deployments.

Responsibilities associated with this role include, but are not limited to the following:

* Ensure that all Blacknight customer installs go smoothly and that there is ample space, power, hardware on site for each deployment; liase with hardware vendors regarding delivery of DC specific hardware; consistently strive for excellence.
* Provide hands and eyes support on-site when necessary during business hours
* Ensure that all customer subscribed services are setup within SLAs outlined in contracts.
* Plan and implement future developments, such as systems upgrades, using detailed plans and scheduled upgrade or implementation cycles
* Keep track of all deliveries to and from our two Dublin POPs.
* Support deployment of hardware, software, service, and maintenance of all information technology assets
* Assist in resolving systems outages out-of-hours through participation in an engineer call-out roster

Skills / Knowledge

* Basic knowledge of networking protocols (DNS/IPsec/POP3/SMTP and TCP/IP in mixed operating system environments)
* Experience managing Linux and other open-source operating systems ideal, scripting ability essential, experience with commercial Unix, such as Solaris, an advantage
* Cisco IOS experience, switch configuration and general Layer 2 topology awareness
* Hardware skills with Intel based servers from vendors such as Dell, HP and Intel.
* Meticulous care in deploying servers and all Blacknight, customer equipment and all network cabling.
* Attention to detail regarding keeping records up to date
* Disaster recovery planning/testing
* Hardware diagnostic skills would be desirable, be able to trouble shoot problems with servers to determine if a fault is software of hardware based and be able to use your own initiative to contact vendors for replacements where applicable

PERSONAL ATTRIBUTES:

* Excellent communication skills
* Must be forward thinking and have good planning skills
* Excellent troubleshooting, judgment, and decision-making skills
* High level of self-motivation and proven ability for self-study and learning on the job
* Flexible and dependable, On-call work and Out of hours duties may be required from time to time
* Team player

QUALIFICATIONS/ EXPERIENCE

* 1 year experience working in a similar role

Salary

* Negotiable, between € 23,000 and €27,500 plus benefits and depending on experience

If interested please send CV in RTF or PDF format with covering email to management@blacknight.com.

Recruitment agencies NOT welcome. Cold calls / emails from recruitment agencies in relation to these or any other vacancies will not be welcome and will be treated with contempt.

Overview:

On 10:04am on 4/2/2008 an ethernet card failed in a device on one of our metro-e providers Layer 2 connectivity device in DEG. Immediately (within 50ms) our kit failed over to our backup route into DEG. There was no service disruption during this window due to our resilient network design. At 12:00 the card was replaced and this link came back up and we flipped our traffic back over to our primary link. Again service was unaffected.

We received the RFO from our metro-e provider yesterday afternoon that basically said what I've described above. A card failed and it was replaced within 2 hours.

When: Starting Wednesday 9th @ 22:00 and ending Thursday 10th @ 01:00

What: Migration of Dedicated, Colocation and IP transit customers
to new Juniper network layer.

In December we bought a bunch new of Juniper routers to upgrade
our core network with. The ones that were there, were almost 2 years
old and were due an upgrade.

We'll have the new Juniper router pair pre-configured with all prefixes
and BGP sessions. We'll slot it into place and clear the arp cache
on all affected layer 2 devices and shut down the old device. There will
be approx 10-30 minutes where routes to certain parts of our network
are unavailable.

This will also remove the need for our old IPv6 configuration. We'll now
have end to end native IPv6 core running on the Juniper platform. We're
the first hosting company in Ireland to build a native IPv4, IPv6 network
core on the Juniper platform and we're very proud of this fact.

Who will be affected:

Customers on our unfirewalled network (who have their own routers or
firewalls) or IP Transit customers.

This affects both customer groups in InterXion and DEG locations. If you
are unsure if this affects you or not, give us a call or drop an e-mail
into support@blacknight.com

Summary:

On Wednesday 9th starting @ 22:00 hours we'll be performing maintenance
on the routers that run our un-firewalled and IP Transit networks.

Connection Issues This Morning

| 3 Comments

As some of you know there were connectivity issues earlier this morning.

At this juncture all service should be restored, however if anyone is experiencing issues still please let us know

We'll provide more details on what happened as soon as one of the technical team provides a report

When: Thursday 27th Of December 2007 at 22:00

Duration: 2 Hours

What: Upgrade of two core routers within the Blacknight network. Each
core router will be swapped out one at a time. We do not anticipate any
downtime of network connectivity for customers during this maintenance
window, although customers will see routing path changes and/or brief
latency hits as traffic is rerouted.

Update: Friday Dec 28 @ 11:00am

This work was carried out successfully and without interruption to customers traffic. We'll be scheduling one more maintenance window for the core network for sometime in Jan to swap out another one of the older routers for one of our new Juniper routers.

Connectivity Issues - BK2

There have been some issues with the connection into our second location this lunchtime.

We will post a more detailed explanation as soon as we have one.

Update: (13:57)

After investigating the issue with our Data Centre and Connectivity partners we've narrowed down the issue to being a faulty patch panel and/or patch leads.

To minimise further downtime in the next 24-48 hours we're going to hold off debugging this issue further. We'll schedule a network maintenance window in the coming days so we can test both links to DEG from InterXion fully.

The lunch time fault today occurred because both metro connections dropped at the same time as InterXion were carrying out cabling work in their patch room. We're working with InterXion to get a resolution on this, but as I said above we want to minimise downtime for our customers by scheduling a window in 24-48 hours to solve this issue fully.

If you have any queries about this please contact us via phone/email/forum or leave a comment on this post.

As part of upgrading our network we are changing access to our primary
name servers (217.114.173.6 and 82.96.97.64) so that they are authoritative only. If you have servers on our network that are set to use these, then you will need to update the DNS settings.

We have attempted to notify all customers whose servers are currently using our authoritative name servers. However, if you have not been contacted and believe your server(s) may be using them, then you can contact us directly for more information. Any customers who put in a server within the last six months should be already using the new servers.

On Friday the 7th of December, the servers will be made authoritative only. If your name servers are not updated by then this may cause issues with connectivity.

Cold Fusion Maintenance Cancelled

The maintenance window for Cold Fusion has been cancelled

Our technical team found a resolution that did not require any upgrades

Emergency Switch replacement

When: 21st Nov 2007 @ 22:00

We require a 2 hour maintenance window to replace a switch we think may be faulty (after the outage at 8:40am on 20th of November 2007).

We'll rack and config the switch and hope to start moving customers at 22:00 hours. We expect around 2-5 minutes per customer for the change over. However just to be safe, we'd like to use a 2 hour maintenance window.

Summary:

Starting at 22:00 hours on November 21st we'll be replacing a switch in a customer rack on the first floor.

A further notice to the affected customers will be sent out via e-mail.

Network Outage - Nov 20th 2007

Summary

At 08:40:03.514 am this morning access switch 15 in a customer cabinet on the first floor detected a loopback on both of its GigE connections and both switch ports were set to err-disable. Our syslog servers never got this information so we assumed the switch had physically failed.

Time Line:

08:40 alerts received by on-call engineer, engineer proceeded to trouble shoot the issue
08:50 issue was perceived to be a switch failure, DEG notified to reboot the switch and to cable test both cables connecting the switch to the core network.
09:15 DEG call back saying that the issue doesn't appear to be related to cabling as they had ran the fluke test on both uplink cables.
09:16 DEG report both ge0/1 and ge0/2 are showing state down, both syslog servers checked for possible data that may explain this. Nothing found.
09:20 Blacknight ask DEG to connect another port on access switch 15 to our core network. DEG have to make up a cable.
09:35:02 Fa0/19 comes up (without config)
09:42 Config placed on Fa0/19 to carry trunk traffic to core network, network comes up 30 seconds later
09:43 Network in customer cab resumes and all machines come back online.

Provisions:

We're going to investigate this issue as this is not normal behavior and neither of the core access switches report any issues with loopbacks.

We'll swap out this switch incase there is a fault with it today.

Total downtime for this customer cab was 1 hour and 3 minutes.

Inex Maintenance Window 28th November

We have been informed by INEX, the Irish Neutral Exchange, that they will be conducting maintenance on 28th November between midnight and two am Irish time.

Peering with Amazon Europe

We've always been one of the first companies to peer with new members of the INEX. Last week Nick Hilliard from INEX announced to the peering community that Amazon were connected and ready to start peering with other members.

Today we've added Amazon Europe on LAN#1 on the INEX. All Amazon services that are hosted in Europe will be available over this peering session. We've also heard that other services like S3 etc will be hosted in Europe shortly (read that as 6-12 months or maybe sooner) and as above will be available to us and our customers.

Please be aware that these services are not available on any other Peering platform in Europe, the INEX is the first exchange in Europe that Amazon have connected to so as a hosting provider we have unique, low latency, high bandwidth capacity access to Amazons web services.

update:

I hear that Euro S3 is available already! So we're directly connected with the network that S3 is served from. This is very good news for our colo and dedicated customers that are using S3!

When: Monday 12th of November @ 22:00

What: Migration of CAR's and Firewalls into a single vlan

We're moving the Customer Access Routers and firewalls into a single
vlan so as traffic exchanged between the firewalled and unfirewalled
network segments doesn't need to go into the distribution layer
of our network and hence is switched and not routed. This will
make these sections of our network more efficient and more reliable.
For people who care about network hops, it'll take an extra hop out
of traceroutes between some our network segments.

Who will be affected:

Customers on our unfirewalled network (who have their own routers or
firewalls) will notice a momentary blip while OSPF reconverges. IPv6
for the same customers will see a slight blip of around 2-3 minutes
while we move some things around.

We estimate that there could be 2 - 3 30 second blips to IPv4 traffic
during this window and 3 - 10 minutes of IPv6 downtime.

Summary:

Colocated and Dedicated customers with their own firewalls and routers
and those who are not behind our own dedicated High Availabiltiy firewall
solution will be affected by this maintenance window.


Blacknight Engineering / support@blacknight.com

fireworks

The scheduled maintenance for last night went ahead on time.

According to our engineering team most people would have been affected very briefly (less than one minute).

If anyone is experiencing issues please let us know ASAP. While everything has been tested thoroughly and we have not had any reports of issues to date there is always a possibility that someone was affected - let us know if you were.


Personally I'm overjoyed that the upgrade was finally completed, as it means that our network is a lot more resilient than previously, which means I get to sleep more soundly at night!

When: Wednesday 17th of October @ 22:30 hours

What:

Firewall Upgrade. We're moving our colocation and dedicated server
customers out from behind the current HA pair of firewalls. We've
indicated recently on our blog that we bought 4 new Cisco ASA firewalls
and the time has arrived to install them.

Who will be affected:

Both of our firewalled networks will be affected by this. Firstly
our shared hosting firewall will be moved to a new IP address on the
WAN side to facilitate VPN configurations for our colocation and dedicated customers.

Secondly the new ASAs will be put in place and they will replace the current
firewalls and access routers for these customers.

We estimate around 30 minutes to an hour to move the shared hosting firewall
and around another 30 minutes to an hour to facilitate the new firewall
install. This includes all the cabling work etc that will need to be done.

We will also allow a further hour for testing of both networks, so we're looking
at a maximum of 3 hours for this work to be completed.

Summary:

All colocation, dedicated and shared hosting customers will be affected by this outage.

Sorry for the delay in providing this report regarding the last network issue

Date: October 7th 2007
Timeline: 02:15 - 03:35am (Irish time)

Affected Customers: Any customer on the shared firewall that has a dedicated server or has colocation with us was affected during this incident.
This also included our shared hosting clients.

What happened?

At around 2:15am on Sunday 7th of October a segment of our main network was sluggish and people would have experienced latency and packet loss.

Why?

As you may know our main network is firewalled. We have a pair of firewalls setup in HA (high availability) to protect the bulk of our clients, which includes all our shared hosting clients on both windows and linux, as well as a large number of clients on dedicated servers or with colocated machines.

Similar to the events of September 11 this year this was mostly because the firewalls we're using have 100meg ports and as such are easily flooded by this simple attacks. We've already put the wheels in motion to upgrade these and we hope to announce the upgrade at the end of this week.

A brief timeline of events is shown below.

02:15: Alerted that sites on the network are not reachable.
02:20: A check reveals that any site behind the shared firewall is
not accessible.
02:30: A reboot of the firewalls is not successful in getting a
response. so an engineer is dispatched to Dublin
03:30: A check by the Engineer on site indicate that one customer
box is sending out masses of UDP traffic. The firewall is attempting to stop all the traffic at the cost of bring down everything else includingthe local console.
03:35: The customer port is disabled and the firewall becomes responsive
again.

Network Issue Last Night

Last night around 2am (Irish time) we experienced a network issue that affected some parts of our network.

All services were returned to normal as soon as possible.

We will issue a full report on last night's incident as soon as I have had an opportunity to discuss it with our network engineers.

In the meantime I would just like to apologise for any inconvenience caused to any affected clients.

New Cisco Firewalls

| 9 Comments

Following on from last Tuesday's incident we are following through on our promises.

Our technical team had been discussing the finer points of various firewalls for some time. When it comes to choosing equipment they always spend quite a bit of time evaluating the options. They have to take into account a lot of different factors.
How well will it work with existing equipment?
Will it scale?
How long before we have to replace it?
How much does it cost?
Do we have staff who know how to use it?
Does it support ipv6?
How much traffic can it handle?
How many concurrent connections can it handle?
How much RAM does it need?

The list goes on and on...

In the end we decided to go with Cisco ASA 5500 series.

And since we love our camera phones here are a couple of snaps of the new firewalls. Before anyone asks - I'm not 100% sure when they'll be installed.

cisco-asa-firewall-frontview.jpg


And from behind:

cisco-asa-firewall-rearview.jpg

And a slightly further away shot:

cisco-firewalls-longview.jpg

Summary:

An internal routing issue developed in our network between our edge routers and our core distribution routers.

Diagnosis and Resolution:

During regular network maintenance Blacknight staff were moving a customer from the shared vlan to their own VLAN. During this move we were forwarding IP packets from their old IPs to their
new IPs. Normally this should not be a problem. However in this case, the rules caused OSPF on the primary distribution router to flap.
As a result of the session flapping the secondary router was not able to take over correctly. We manually failed over to the secondary router and at this point the network stabilised.

We are still investigating this issue, but we believe a router upgrade that is planned for later in the year will fix this issue permanently.

Yesterday at lunchtime there were some issues on our network.

I'll try to explain what happened in simple terms and also explain what we are going to do to avoid this type of issue arising in the future.

If anyone has any queries about the explanation please feel free to ask via comments or email us directly.

Timeline: 13:55 - 14:18
Affected Customers: Any customer on the shared firewall that has a dedicated server or has colo with us was affected during this incident. This also included our shared hosting clients.
What happened?

At around 2pm yesterday afternoon a segment of our main network was sluggish and people would have experienced latency and packet loss.

Why?

As you may know our main network is firewalled. We have a pair of firewalls setup in HA (high availability) to protect the bulk of our clients, which includes all our shared hosting clients on both windows and linux, as well as a large number of clients on dedicated servers or with colocated machines.

Firewalls are basically computers. Depending on how much money you want to spend on them you get different capabilities. While our firewalls are perfectly adequate under most conditions they have limits.

When a server behind the firewall was compromised and started pumping out large amounts of traffic the firewalls were pushed to capacity. While the network was up at all times it would have been slow and unresponsive until our engineering team were able to take action.

What action was taken?

The server that had been compromised was disconnected from the network until the issue had been resolved / removed.

How can we avoid this in the future?

We had been planning to upgrade the firewalls in any case, this is now being moved forward. The new firewalls will be able to carry larger amounts of traffic so this kind of issue will have a lower impact should it arise again.

For the last few months we have also been actively encouraging clients to opt for their own firewall(s).

And now for the more detailed breakdown:

Outage Information with Timeline of Events

13:53 C program downloaded onto a customer's machine via a hole in their
programming code.
13:55 Code compiled and executed. A result of this was 80mbit/s of
additional traffic heading towards the shared firewall service during peak lunch time traffic.
14:05 Our engineering team noticed latency of SSH and terminal services connections to machines on the network behind the firewall were laggy or intermittent.
14:06 Senior onsite engineers begin to investigate the issue.
14:08 One of our external traffic links was carrying approx 50mbit/s more
traffic than normal (some traffic from the affected host never made it past the firewalls) and they begin to check access switches for which equipment cabinet has the infected host.
14:15 The host responsible for this increase in traffic was identified and
their switch port was shutdown by a network engineer.
14:16 Services begin to return to normal and the load on the firewalls CPU
drops back to acceptable limits.
14:18 All services are back to normal

Server Issues - Morgana

The shared server "morgana" has been having intermittent issues for most of today. Our engineering team are working on a resolution. More information on the issue has been provided here

There are several reasons why this blog exists and one of them is to get feedback from clients.

It may come as a surprise to people, but we actually do pay attention to what they say to us and about us.
I'd love to think that we do a good job all of the time, but there may be aspects of our service that fails to meet your expectations and if that's the case I'd like to know about it. (If you don't want to comment in public you can always email me directly: michele@blacknight.eu ). It might be something as simple as the way we worded our product or service offering ... If people don't let us know we have no way of knowing!

We are currently working on rolling out a new suite of websites and we will be unveiling a whole range of new products and services over the coming months. I'll be teasing you all with little details as we finalise the details, but now is also the ideal time for us to take your feedback. If you want us to offer something that is feasible then we might just do that. Of course we might think your idea is crazy ... but if you don't talk to us we will never know.

What kind of services would you like to see hosting providers like us providing in the future?

What elements of our current hosting plans would you like us to change? (I'm not saying we will change them, but I am more than willing to listen)

Which technologies would you like to see us offering in the future?

Network Maintenance Completed

The scheduled network maintenance tonight proceeded as planned and has now been completed.

All systems should be fully operational, however if anyone has any issues please let us know ASAP.

One of our network engineers has been onsite for the last few hours conducting the upgrades, reboots and checks.

Our team of networking engineers like to keep our network running smoothly and I am happy to say that they do a very good job of it overall.

Of course this means that from time to time they have to upgrade and patch things.

So between 27th and 28th of August we will be doing maintenance on our Cisco switches, which involves upgrading the IOS on some of the devices.

The affected switches will have to be rebooted, so there could be a loss of connectivity for up to a minute as the devices reboot.

Since we're doing this in the middle of the night it should not affect many clients as it is set to happen between 2200 and 0200 GMT. If you are located in the US for example that would be 1600 to 2000...

In any case there's more info on the forum

Fibre Issue Update

We have received a detailed report from our fibre provider regarding last Friday night's outage.

As the report is very long and highly technical I won't be publishing it here.

If anyone affected by last Friday night's issue would like more information about the steps that both ourselves and our fibre providers are taking to avoid future issues please let us know.

For about 20 minutes this morning users may have noticed that connection speeds / response times from some servers were slower than normal.

This was due to a denial of service attack the details of which are outlined below.

Timeline: 08:15am till 08:38am

Location: DEG, Blacknight Dub1 data centre

Problem and Resolution:

At approx 4am this morning a client machine started spewing data out of our network. At this time the traffic was not significant enough to trigger any alarms or cause any downtime.

At approx 8:15am this morning, a second attack started from the same machine with a significant increase in traffic. This traffic was tiny UDP datagrams aimed at an external host. The sheer volume of packets overloaded the CPU in the primary Firewall and as such it was dropping large numbers of packets.

We disabled the switch port that this machine was attached to and network flow resumed. We took preventative measures on the routers facing the customer machine to filter traffic from hitting the firewalls. We then re-enabled this customer port and logged into the machine to diagnose the issue.

The machine has since been removed from the network and is being examined by our security team.

Unscheduled Outage - Fibre Issue

There was an issue with connectivity into our second location in Dublin on Friday night.

Timeline: 11:17pm till 02:11am

Location: InterXion, Blacknight Dub2 data centre

Problem and Resolution:

The problem was identified at 11:17pm when we were unable to reach any equipment over our primary link into InterXion (IX).
This link is from Broighter Networks and is our primary link into IX.
We dispatched an engineer on-site to diagnose the problem and to eliminate our own hardware as the source of the problem.
We had completed this by 00:30 and we had switched both ends of the link to alternative hardware in DEG and IX.

We then notified Broighter that we had diagnosed the fault to be on their end. They in turn tried some fault diagnosis with no success, including a reboot of their fibre switch which impacted other customers of theirs. They then dispatched an engineer with a new switch + line cards to IX at around 01:00 ~ - he arrived on site and had to migrate customers to the new switch, this took a bit of time.
At approx 02:11 packets started routing again into IX and the issue was resolved.

We are awaiting a detailed explanation from Broighter regarding this outage, as we have a protected fibre ring which should be fault tolerant.
The main problem with this outage was that the physical layer, layer 2, never dropped and so it took significantly longer to fix than we would have liked.

Future protection against such outages:

We're provisioning another protected circuit between DEG and InterXion with an alternative carrier.
Unfortunately even if we had had this on Friday night, it would have been no use to us as the physical layer never went down and any automated switchover as a result of a failure would not have occurred.

In the future, if we have similar issues we can simply disable 1 of the rings in the event that the issue re-occurs.

Network Upgrade (again!)

Just to let people know that we are doing yet another network upgrade next weekend.

When?
Friday 6th, Saturday 7th of July
Time: 22:00 - 02:00

For full gorey details see this post on our forum

Carlow.pl Moves to Blacknight

carlow.pl logo

As a company based in Carlow it's nice to see sites that serve the local community being hosted by us.

One such site is Carlow.pl, which completed its migration from a server in Poland this weekend.

Carlow.pl was setup to serve both the Irish and Polish communities in the greater Carlow area and is available in both Poish and English.

The site's administrator, Krystian Kozerawski, moved to Carlow about eighteen months ago and started work on the site before Christmas of last year.

As the non-English speaking population in Irish regional towns grows it's nice to see new sites and services launching to cater to their needs, so we were delighted to offer Krystian and his community some space on our servers.

I'll be looking forward to seeing what he does with some of the other regional portals he has planned!

Network Maintenance Followup

Just to followup on the network maintenance from the other evening.

The work went ahead as scheduled and we have not had any reports of issues from clients.

We will be announcing the next phase in our network upgrades and maintenance plans in the coming days.

To keep abreast of these changes I'd recommend you subscribe to our RSS feeds :)

Networks

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.24-en
We have published 547 articles so far.

About this Archive

This page is an archive of recent entries in the network category.

Ipv6 is the previous category.

redundancy is the next category.

Find recent content on the main index or look in the archives to find all content.

DomainInformer Readers' Choice Top 10