Building Hadoop clusters review


If you are interested in Hadoop technology probably this is an interesting video course you should evaluate. As you probably know, Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. All the modules in Hadoop are designed with the assumption that hardware failures are common and thus should be automatically handled in software by the framework.

Talking about the video course, we can divide the content in three main macro-sections:
1. how to create and set up a three machines cluster using Amazon EC2,
2. how to install an Hadoop cluster using Apache Ambari,
3. how to start using Hadoop cluster, in particular with Apache Hadoop User Interface (HUE).

The description of all the topics is clear and well done (Sean Mikha, the author, did a good job). All the relevant topics are always detailed before with an explanation of the logic structure and approach and only after with a demostration on how to do it in practice.

Useful also for other purposes, the creation of the virtual machines on Amazon EC2. The practical description and the step by step creation, is not limited to the server’s creation but is detailed also in what concerns the security and connection using, for example, putty ssh client.

apache hadoopIn my opinion the most relevant value of this video course is on the hidden details of the Hadoop cluster installation process. As you will see if you will decide to follow it, the tasks are quite easy to do (probably this a Sean’s merit) but the configuration details and settings are very important if you want to make it work in practice. Following the hints I’m sure every neophyte will gain days of work and lot of nights in googling. ūüėČ

Enjoy your Hadoop Cluster video course…¬†as usual by Packt Publishing.

Francesco Corti

Solr doesn’t return more than 1,000 objects in Alfresco.

lucene_apacheOnce upon a time Alfresco used Apache Lucene as search engine….

This was great until you had particular needs like, for example, a long duration query or a query that retrieves a huge amount of objects. It was more than a year ago when I wrote a post talking how Alfresco retrieves 1,000 results maximum or query for a couple of minutes.

solrAs you can read in the post, the most suggested solution to the problem was to migrate the indexing engine to Apache Solr. At that time, Alfresco supported both the engines and considered Solr as its future.

Today Lucene and Solr are always supported and Solr is probably the most used, but regarding the same issue, probably something is coming back again.

>>*) <<

As you can read from the JIRA issue, in Alfresco 4.2.e SOLR also returns a maximum of 1,000 results and to solve the issue is suggested to set the parameters below in the file.


This could have a high impact on “big” queries or “long” queries so I would like to share this information with all of you to prevent problems or nights spent on the debugger. ūüėČ

I hope this will help you.

Francesco Corti

(*) Thanks to Francesco Fornasari and Christian Tiralosi for the hint.

Yet another Alfresco Community upgrade tutorial: from 4.0.d to 4.2.f.

The task to upgrade Alfresco (Community or Enterprise) from a version to another more recent, has to follow a clear and precise path.¬†In every case it is always a critical task and in some cases could be a serious problem for the Organizations (of course this is more critical for Community Editions).¬†In some cases the only possible solution is an Alfresco-to-Alfresco migration instead of an upgrade… but this is another scenario.

In this tutorial is described a step-by-step approach to an upgrade from an Alfresco Community Edition v4.0.d to v4.2.f in one only upgrade step. In every case, even if the involved versions are different, the approach is always the same discussed here.

Needless to say: I am not responsible for any damage that may happen after following the given instructions, which hopefully will not happen.

The (only) correct approach

Before starting I would like to share the (only) correct approach: please remember that the upgrade process for the Alfresco Community Editions is tested (and not guarantee) for the closest versions (for Alfresco Enterprise you can take a look here). This means that the only path you can follow to upgrade a very old version to a recent one is always to develop multiple upgrades.

For example, if you come from the v4.0.d and want to go to the recent v5.0.a, it’s only written in the stars if the direct upgrade will work.¬†The most verified approach is to develop the upgrade process with the steps described below:
– Upgrade from v4.0.d to v4.0.e,
– Upgrade from v4.0.e to v4.2.a,
– Upgrade from v4.2.a to v4.2.b,
– Upgrade from v4.2.b to v4.2.c,
– Upgrade from v4.2.c to v4.2.d,
– Upgrade from v4.2.d to v4.2.e,
– Upgrade from v4.2.e to v4.2.f,
– Upgrade from v4.2.f to v5.0.a.

You can take your own risks “jumping” some steps, and in some cases it would work, but nothing is garantee in every case.¬†In this tutorial I decided to take a reasonable risk, often discussed in the forums and tutorials, and “jump” with a single upgrade process.

Preparing the upgrade

To develop the upgrade I need the Alfresco backup of my v4.0.e production installation.¬†If you don’t know what is an Alfresco backup and how to obtain it, I strongly recommend to take a look here.

In this tutorial I choose to define a brand new server with the recent Alfresco installation (in our case the v4.2.f) but you could choose to use the same server.¬†Of course, in this case, the task is even more critical and the steps are the same but developed in different folders from the “old” version of Alfresco.

The new Alfresco installation

As introduced before, in this tutorial I work in a vanilla server with Ubuntu 14.04 LTS on board. In the server is installed Oracle Java v1.7.60u, always installed as described here.

To install Alfresco you can follow this tutorial even if it describes one specific version (the installation steps don’t change too much).¬†Alternatively you can choose to install it using the easier wizard.¬†In every case you will install the target version of Alfresco, in our case: Alfresco Community v4.2.f.

For the purpose of the post, the way you use to install Alfresco is not relevant but remember that it will be your brand new server, so it’s always suggested to have the most robust and stable one. ūüėČ

If you have some customizations (custom models, behaviors, actions or something else) not it’s time to install them in the new server.¬†The task is always the same: stop alfresco, deploy the customizzations in the way you always do (AMP, maven, manually) and start Alfresco again.

As final step, it is always suggested to switch off the indexing. In our case we suppose to use Solr but with Lucene it will be the same. To develop the task, please follow the steps below:

cd <alfresco>
./ stop
nano tomcat/shared/classes/

 #solr.port.ssl=8443 (comment it)

Save and exit.

Database restore

Now it’s time to restore the alfresco database from the backup.¬†To do it, please be sure that PostgreSql (or the database you use) is running.¬†If you installed the Alfresco with the wizard you can use the command below.

./ start postgresql

To delete the current Alfresco’s database use the commands below.

cd <postgresql>/bin
./psql -h localhost -U postgres -d postgres

  DROP DATABASE alfresco;
  CREATE DATABASE alfresco WITH owner = alfresco;

To restore the database dump you can use:

./pg_restore -h localhost -U postgres -d alfresco <file.dump>

Filesystem restore

Once the database is restored you have to restore the documents on the file system from the backup.

cd <alfresco>/alf_data
rm -rf contentstore
rm -rf contentstore.deleted

Now it’s time to copy the ‘contentstore’ and ‘contentstore.deleted’ folders form the backup, directly in the ‘alf_data’.

Can’t you see the indexes are not restored?¬†If possible it’s always preferrable to rebuild the indexes from scratch.¬†In the other cases we suggest to restore them from the backup, hoping nothing changed in the structure. ūüôā

Alfresco bootstrap

Now everything is ready to start alfresco again.

cd <alfresco>
./ start
tailf tomcat/logs/catalina.out

You will see that the starting process is updating the database and everything is necessary to upgrade the system.¬†Errors or problems will be listed here…

Indexes rebuild

As you read before, the Alfresco update has been without the indexes.
Now it’s time to rebuild them following what you read here.

./ stop
nano <alfresco>/tomcat/shared/classes/


cd <alfresco>/alf_data/solr
rm -rf workspace/SpacesStore/*
rm -rf archive/SpacesStore/*
rm -rf workspace-SpacesStore/alfrescoModels/*
rm -rf archive-SpacesStore/alfrescoModels/*
cd <alfresco>
./ start

Enjoy your brand new Alfresco installation…

Francesco Corti

Alfresco roadmap for the next 12 months

roadmapAfter some requests from some users, the new Alfresco roadmap has been released in the official wiki.¬†This roadmap doesn’t seems to be like the others of¬†the past.

I read that the amount of topics are less than the past. By the way, each topic seems to be more detailed and “complete” (in the past most of the items were less specific than this). Comparing with the past roadmaps I can read a lot of “Enterprise only” in some important new features.

Have your own opinion reading the complete roadmap below.

Francesco Corti

Review of the Alfresco CMIS book by Martin Bergljung

Alfresco CMISAs you probably know (or you read it now for the first time) CMIS is an open standard that allows different ECMs to inter-operate over the Internet through the definition of a collection of services and a powerful query language (CMIS-QL), modeled along a subset of SQL.

The goal of this book is to share and explain all the basics of the CMIS, using a practical and technical approach that starts from the history (why the CMIS was born), going through the definition of the (several) services and the query language, and ending with a collection of examples describing how to use CMIS in practice.

CMISOk, CMIS is thought to make different ECMs interoperate, but the amount of different languages and examples described in this book is interesting¬†and well done. Starting from Java (with Apache Chemistry libraries), Javascript + JQuery, Groovy and (bascis of) PHP. Yes, I’m agree with you if you are thinking that the CMIS libraries are more and more than this but the description (and explanation) of the CMIS services (and examples) is all you need to understand how to approach the development using all the different languages supported (.NET, Python, ecc.).

As you can read from the title, Martin Bergljung focuses his description on Alfresco. And this is true because all the examples are developed using an Alfresco repository as referred architecture. But inside the book you can find something more about Alfresco. Personally I have found very interesting the description of the Alfresco Surf together with CMIS standard. Probably this topic is less useful for the most part of the readers (and practical cases) but is an interesting example related to the basics of the Alfresco Share application. Quite interesting also the example on how to make Alfresco and Drupal interact, using CMIS.

packt-publishingLast but not least, I read the book very easily in the first part (the more descriptive) and in the last (full of practical examples in the different languages). I think I will use the book also as manual¬†of the several CMIS services when I will develop something because I suggest you to remember that…

Standard is good!


Francesco Corti

Alfresco Hack-a-thon 2014 – Brussels

Last 16-th of May has been the first Alfresco Global Virtual Hack-a-thon day. One of the physical locations was Brussels, more precisely the CIRB-CIBG (here the post about the event).

I was there with the AAAR project and, as usual, lots of “old” and brand “new” friends. Thank you to Boriss Mejiass,¬†Lanre¬†Abiwon (DarkStar1),¬†Cristina Mart√≠n Ruiz,¬†Ole Hejlskov and all the other attendees.

Below a short video about the nice time together.

Francesco Corti

Win your free copy of the Pentaho Reporting video course

pentaho reporting video courseHold a chance to win free download link of the Pentaho Reporting video course, just by commenting this post!


For the contest we have 2 download copies of Pentaho Reporting video course, to be given away to 2 lucky winners.

How you can win:

To win your copy of this video course, all you need to do is come up with a comment below highlighting the reason “why you would like to win this video course‚ÄĚ.

Duration of the contest & selection of winners:

The contest is valid for 2 weeks, until the¬†27-th of May, and is open to everyone. Winners will be selected on the basis of their comment posted, from the author… yes, it’s me! ūüôā

Packt Publishing videoMany thanks to Pack Publishing for the opportunity!

About the video course:

If you are a Java developer or IT professional who wants to assemble custom reporting solutions with Pentaho Reporting, this video course is ideal for you. Master the advanced concepts within Pentaho Reporting such as sub-reports, cross-tabs, data source configuration, and metadata-based reporting.

 A practical video guide, which dives directly into report generation using various techniques, offering you all of the tips and tricks needed to understand Pentaho Reporting. Learn how to create, modify, implement code, and publish professional reports that will boost your business enterprise to a completely new level.


So, don’t be shy: leave a comment here below!

Francesco Corti

ffmpeg for Ubuntu 14.04 LTS (mandatory for Alfresco)

ffmpegIn a past tutorial (one of the most accessed post of the blog) I shared a step by step tutorial describing the installation of Alfresco using an more controlled and “enterprise” approach respect to the bundle. The tutorial referrers to Alfresco Community Edition 4.2.c on Ubuntu 12.04 LTS and few days ago has been released the brand new Ubuntu 14.04 LTS Operating System.

Installing Alfresco as usual, I have found the FFmpeg is not included in the Ubuntu repositories and it has been replaced by libav (an ffmpeg fork of the project). For further details, you can read something here:

Waiting for an official answer from Alfresco, I found how to solve the problem. In particular, how to install ffmpeg on Ubuntu 14.04 LTS that is mandatory to make Alfresco works.

Other instructions on how to install Alfresco 4.2.f on Ubuntu 14.04 LTS are similar to the tutorial, so I don’t want to repeat them here (probably I’ll refresh the post in the future).

Coming to the solution…

To install ffmpeg on Ubuntu 14.04 LTS you have simply to open a terminal and execute the commands described below.

sudo apt-add-repository ppa:mc3man/trusty-media
sudo apt-get update
sudo apt-get install ffmpeg gstreamer0.10-ffmpeg

I hope this information will help you to install Alfresco with success in your environment. ūüėČ

Francesco Corti

Your Alfresco custom theme deployed as AMP or ZIP file

Alfresco ShareIn the various posts of this personal blog, the development of a custom theme of the Alfresco Share front end, is one of the most relevant and accessed. In a past tutorial, I share a simple way to develop your own theme for the Alfresco v4.2.c distribution.

In this post I would like to reach them same goal, but with an important improvement: the release of the theme in an AMP file or a ZIP file. All using an automatic solution based on a Java project managed by Eclipse IDE and Apache ANT.


To understand how this solution works, you should have confidence with Eclipse IDE, Java projects, Git, Apache ANT and of course Alfresco (in particular with the AMP format). If you don’t have confidence with all of those things, this use case could be a way to improve your skills but, please, refer to the official documentations to learn how to use them before trying to test the content of this post.

Alle the source code described in this post has been tested in a Ubuntu 12.04 LTS operating system with Alfresco Community Edition v4.2.f on board, installed as bundle distribution. All the content should be valid for all the Alfresco distributions of the v4.2 family, both Community and Enterprise.

The git project my-first-alfresco-theme (on GitHub)

githubIn the repository below, you will find a Git repository called ‘my-first-alfresco-theme’, developing a copy of the ‘Green theme’ contained in the Alfresco vanilla installation. The developed theme¬†is¬†called ‘Alfresco themes – My first theme’.

The first task to do is to import the project from the repository, using Eclipse IDE at the link below:

If you are not confident on how to import a project from Git, please refer to the web with a huge amount of tutorials or documentations. ūüėČ

Once the project has been imported with success in your Eclipse IDE, in the ‘build’ folder you will find the two packages containing the theme: one in AMP format (‘alfrescoThemes_myFirstTheme.amp’) and one in ZIP format (‘’).

If you want to re-build the packages after customizations, the build file (‘build.xml’) into the ANT view, is what you need for that purpose.¬†If you are not confident on how to build a project¬†with Apache ANT, please refer to the web with a huge amount of tutorials or documentations. ūüėČ

How to deploy the theme in AMP format

The deployment using the AMP format should be preferred to the ZIP format. Below the step by step description of the task.

  • Open a terminal and go to the folder where Alfresco is installed (for example ‘/opt/alfresco-4.2.f’ on a linux platform).
  • Stop Alfresco (for example¬†‘./ stop’ on a linux platform).
  • Copy the AMP file from the ‘build’ folder of the project in the ‘amps-share’ folder (if you use an Alfresco bundle installation).
  • Go to the ‘bin’ subfolder and run ‘apply_amps’ script.
  • Go back to the Alfresco installation folder and start Alfresco again (for example ‘./ start’¬†on a linux platform).
  • Once Alfresco is started, open a browser with Alfresco Share, login as administrator and access to ‘Admin tools’ item in the menu.
  • In ‘Applications’, change the theme to your custom theme.


That’s all!

How to deploy the theme in ZIP format

The deployment using the AMP format should be preferred to the ZIP format. Below the step by step description of the task.

  • Open a terminal and go to the folder where Alfresco is installed (for example ‘/opt/alfresco-4.2.f’ on a linux platform).
  • Stop Alfresco (for example¬†‘./ stop’ on a linux platform).
  • Unzip¬†the ZIP¬†file from the ‘build’ folder, directly in the Alfresco installation folder (you can merge the folders/subfolders/files to correctly install the theme).
  • Start Alfresco again (for example ‘./ start’¬†on a linux platform).
  • Once Alfresco is started, open a browser with Alfresco Share, login as administrator and access to ‘Admin tools’ item in the menu.
  • In ‘Applications’, change the theme to your custom theme, exactly in the same way is described in the picture before.

That’s all!

Francesco Corti