Since there is a new version of the OpenEMPI community edition available in the form of an Amazon AMI, it is time to update the instructions on how to get started with your evaluation of OpenEMPI by using this edition.
AMI is an acronym that stands for Amazon Machine Image. You can think of it as a template that defines the configuration of the operating system and applications that comprise a given environment. The template can then be used to automatically create virtual machine instances. There are many public AMIs available in the Amazon EC2 cloud such as plain instances that use a specific version of the Windows or Linux operating system or more task specific instances that use a specific operating system along with a collection of applications such as a web server, programming platform and database software.
We have made available two Amazon images with OpenEMPI pre-installed along with a reasonable blocking and matching algorithm configuration to work with the synthetic dataset that is loaded onto the instances. The reason we have two AMIs is because one of them is configured to use the deterministic matching algorithm whereas the other one uses the probabilistic algorithm. When starting an EC2 instance you need to provide the AMI template that will be used to initialize the virtual machine. You can look up the AMI using either its name or its AMI ID. The AMI names and IDs for the images we have made available are shown below in the screen capture. These images are available in the East Coast region of AWS. For this blog I assume that you have an Amazon Web Services (AWS) account and that you have some familiarity with the EC2 service. If that is not the case for you, Amazon provides very good documentation for their web services and you can learn more about them here. Once you select that you want to create a new EC2 instance, the first step involves choosing the AMI that you want to use. You can search for the AMI using the name openempi and it should come up right away.
The next step involves choosing an instance type. The instance type specifies the hardware configuration of the instance that you want to create. Amazon EC2 offers many instance types to choose from. If you want to just play around with OpenEMPI to see what it offers then a fairly minimal instance type should be sufficient but we recommend an instance that has at least 2GB of memory and preferably 4GB. You can learn more about instance types, their relative performance characteristics and their cost here.
After you select an instance type, you can skip forward to Step 6 to select a security group. You need to create a new security group that provides access to the instance through ssh (so that you can connect to the instance remotely using an SSH client) and you need to also provide TCP access to port 8080, so that you can access the OpenEMPI administrative console at http://<EC2-instance-hostname>:8080/openempi-manager once the OpenEMPI instance is up and running. Another option is to only open the ssh port which is the default behavior and then use ssh tunneling to expose port 8080 from the VM onto your local machine. If you follow this approach, you should be able to access the OpenEMPI administrative console at http://localhost:8080/openempi-manager.
Before launching the instance you will be asked to create a key-pair. A key-pair is a secure authentication mechanism that will allow you to login onto the instance via the SSH protocol without having to provide a password. If you don’t already have a key-pair that you can use, then you will need to create one. Once you launch the instance, it should be ready to go within seconds. From the instance monitoring screen you can select the instance that you created, if you have more than this one running, and in the instance detail window you will be able to see the hostname assigned to the virtual machine. To connect to the instance use the ssh command on a Unix platform or something like putty if you are on a Windows platform. For the hostname of the instance you can use either the value shown next to the Public IP entry or the value shown next to the Public DNS entry.
If you need root access to the instance you need to use either the username ubuntu to connect to the instance and use the private key that you created when starting the instance or the username openempi. The user that owns the OpenEMPI software on the image is openempi with a password of openempi so, you can just login using something like the following, once again using either the IP address or the hostname assigned to the instance after the @ symbol.
If you prefer to load the instance with your own data then the easiest thing to do is to delete data that is currently loaded onto the instance by dropping the graph database instance. To do that you first need to make sure that the OpenEMPI server instance is stopped and then remove the directory person-db under /sysnet/openempi/openempi-4.0.0c/data. To start and stop the server use the following commands:
sudo systemctl stop openempi # to stop the server
sudo systemctl start openempi # to start the server
When you start the OpenEMPI server again, the database will be re-created automatically but there will not be any records in the database any more. You will then need to add your own data to the instance using the flexible file loader. To use the flexible file loader you need to upload your data file to the server using the functionality under “User Files” and then import the data using a mapping file that you have to define. There are a couple of blog posts available that provide detailed instructions on how to do that so, you can search through this blog for those entries. You can also use the REST API to add records to the instance and test your integration scenarios against the community edition. You can access the detailed REST API Programmers Guide here.
You can access the OpenEMPI web administrative application by going to the URL shown above and login using the default username ‘admin’ and password ‘admin’. In the example below, we performed a search using the first name and last name of a record and found that there are two such records in the system. These records are duplicates that were identified by the system and were linked together and we know that is the case because OpenEMPI has assigned them the same global identifier (the long identifier assigned from domain OpenEMPI).
If you run into any issues with the creation of the instance using this approach, let us know either by email at email@example.com or by posting a question on the OpenEMPI user forum and we will try to assist you with the process.