Create tar archive per subdirectory

Following bash script will generate a tar archive per subdirectory found within a given directory.

Write this code in a .sh file (i.e archive.sh ). Put it in your desired root directory. Execute the .sh using ./archive.sh command.

Recover lost web content using Google Cache

Last week I received this mail from my hosting provider (Scaleway) mentioning they had a critical disk failure and couldn’t recover any of the files. Therefore, sadly I lost most of files I kept on that volume, and also I lost my blog content :(.

Hello,

Your instance ‘xxxxxx’ is running on a blade that had a critical disk failure. We were not able to recover any of the files located on that disk.
Your node has been stopped, the volume located on the disk that had an issue has been detached from your server and is now available on the volume view, if you made snapshot of this disk or if you halted your node recently, you will recover your disk to the latest stop state. If you had another volume located on another physical disk, it will stay attached to your server.

We are sorry for the inconvenience.

Scaleway Team

Files were gone, but couldn’t let years of effort I put on my blog go in vain. So I tried different mechanisms, and this is how I finally recovered all my blog content.

1. Log into the google webmasters tool, and list down URL Errors under Crawl > Crawl Errors
2. Under Not found tab, you’ll be able to see the list of webpages that went missing after the incident.

Screen Shot 2016-06-25 at 3.49.59 PM3. Take one item and search google for that url. (make sure to have “” surrounding the url).
4. If it returns a search result, there’s a good chance that google already indexed the page and have it cached in their servers.

Screen Shot 2016-06-25 at 3.50.41 PM5. Go to cachedview.com, and type in the url of your missing webpage. (exactly as it appeared on google webmasters tool) and click on Google Web Cache button.

cachedview.com6. Then it will load the cached content of your missing web page.

Screen Shot 2016-06-25 at 3.53.25 PM
7. Now create a new post on your blog and copy content from cached web page. (Make sure to keep the same URL, keywords, category, etc..)
8. Repeat same steps for each Not found item in google webmasters tool.
9. That’s it.
10. Now that we all learned the lesson, it’s always good to have a periodic backup strategy to be on the safe side in such incidents. 🙂

Configuring Shibboleth as a SAML2 Identity Provider

In this post I am going to share the steps to configure shibboleth as SAML2 IDP. Hope that would be useful for you also. I’m using Ubuntu 14.04 LTS as my Operating System. However, it should work with other systems as well.

  1. Download Shibboleth IDP : Link
  2. Once you have downloaded the file, extract it into your local file system.
  3. Go to <SHIBBOLETH_HOME>/bin  directory and run the install.sh script (run install.bat if you are on Windows). This would install Shibboleth into the given location in your file system. You would be promoted with few questions as in following. Note: If you do not provide a fully qualified host name during installation, an error may occur. Basically, it should exactly match the format suggested by Shibboleth, i.e., idp.example.org  (there is a regex pattern in the build.xml  file. You can modify it as per your requirements).
  4. We will refer to the installation path that you provide as <SHIBBOLETH_HOME>. Also, this installation would create a key-store and idp.war  file that can be found in  <SHIBBOLETH_HOME>/credentials and <SHIBBOLETH_HOME>/war directories respectively.
  5. Configure a user store with shibboleth. We can use LDAP based existing user store for this.
  6. Open the login.config file that is found in the <SHIBBOLETH_HOME>/conf/ directory and configure your LDAP user store details. The following is a sample configuration for an LDAP user store (LDAP used in WSO2 IS).
  7. Enable the username/password login handler in the by un-commenting below section of  <SHIBBOLETH_HOME>/conf/handler.xml file.
  8. Configure logging level from the <SHIBBOLETH_HOME>/conf/logging.xml file. All the logs files would be saved in the <SHIBBOLETH_HOME>/logs directory. This may be helpful when troubleshooting any issues.
  9. Deploy the idp.war found in  <SHIBBOLETH_IDP_HOME>/war/ in a web application server (i.e. copy idp.war to <TOMCAT_HOME>/webapps)
  10. Enable HTTPS in Apache Tomcat. To do this, edit the <TOMCAT_HOME>/conf/server.xml file and configure the HTTPS connector as below.
  11. Copy /endorsed directory and it’s content of previously extracted shibboleth setup to CATALINA_HOME/endorsed (i.e. /usr/share/tomcat7/endorsed).
  12. Re-Start the Apache Tomcat server.
  13. Check the status of the server by using the : https://localhost:8443/idp/status
  14. Now Shibboleth is configured. However, there are some additional steps that might come in handy.Please note, By default, Shibboleth adds Transient ID as NameID in the Subject element of the SAML Assertion. (The Transient ID attribute definition exposes a randomly generated, short-lived, opaque identifier that can later be mapped back to the user by a Transient principal connector.)However, if you want to add the login name in to the SAML Assertion , you need to do following configuration.
  15. To configure the principal Id as the NameID in the SAML Assertion, In <SHIBBOLETH_HOME>/conf/attribute-resolver.xml, comment out <resolver:AttributeDefinition id="transientId">
    and add the following instead:
  16. To configure a new policy for the principal Id, In <SHIBBOLETH_HOME>/conf/attribute-filter.xml, comment out <afp:AttributeFilterPolicy id="releaseTransientIdToAnyone">
    and add the following instead:
  17. That’s it, Shibboleth is now configured as a SAML2 Identity Provider.

DDoS Detection Using HTTP Communication Flow Analysis

Abstract

Over the past few of years, Application Layer DDoS attacks have been increasingly popular due to the minimalistic nature of application layer security. This type of attacks tries to exhaust the web servers’ vitality by overloading it with a massive amount of HTTP requests. As far as the content of the requests is in legitimate form and the request rate adheres to the protocol limits, intrusion detection system (IDS) can hardly detect such attacks. Despite that, the only factor that could distinguish attackers and legitimate users is their browsing behaviour since the attackers’ browsing behaviour will have a significant difference from that of the legitimate users’. Exploiting that factor, this research will introduce a novel approach to accurately distinguish between attackers and legitimate users. In this approach, it observes the HTTP communication flow and extract characteristics that could describe the browsing behaviour of a user (i.e. page request sequence, request rates, request and content distribution) and model them into a form that could be analysed by an machine-learning algorithm. The probabilistic model that generates from that machine-learning algorithm will be used to distinguish between attackers and legitimate users. Evaluation results based on a collected data set has demonstrated that this approach is accurate and effective in detecting Application Layer Distributed Denial of Service attacks.

Subject Descriptors:

  • 1998 ACM Computing Classification System
    1. C.2.0 Computer-Communication Networks (Security and protection)
  • 2012 ACM Computing Classification System
    1. Intrusion detection systems
    2. Denial-of-service attacks

Key Words:

  • Intrusion Detection
  • Distributed Denial of Service
  • Machine Learning
  • Random Forests
  • Complete report : (contact me)
  • Source code : github

Solving WSO2 Virtual15 hackathon query using WSO2 CEP 4.0.0

In this post, it discusses a possible solution to the first query of WSO2 Virtual15 hackathon, which was held on 26th September 2015. This post was written as a follow up to “Querying complex event streams with WSO2 CEP 4.0.0” to provide more in-depth knowledge on querying complex event streams using a real world example.

The hackathon took the same format as the DEBS 2014 Grand Challenge, posing a real world data set and a set of problems. The data set it used was originated from 2125 smart plugs deployed across 40 houses, where events get emitted to the stream roughly every second for each sensor in each smart plug. At the hackathon, participants were expected to came up with a solution to two queries. Among them, the first query was to come up with a load prediction query to predict load based on current load measurements and those of recorded historical data. In this post, it discusses a possible query (or a solution) to predict load according to their specification. Continue reading Solving WSO2 Virtual15 hackathon query using WSO2 CEP 4.0.0