Simply IT, Inc: 2015

Tuesday, June 23, 2015

Setting Up SolrCloud in Solr 5.x

While there is a lot of documentation on the Solr Confluence Wiki, it may be challenging to find all of the right levers to pull in order to start a multi node SolrCloud instance without using the provided script which creates an example SolrCloud for you via a command line wizard. This post is intended to be a step-by-step guide to manually create a SolrCloud cluster without the user of the example script.

Installing a Zookeeper Ensemble

In order for your Solr instances to automatically receive configuration and participate in the cluster, you need to install a Zookeeper Ensemble. It is possible to have only one Zookeeper instance to run your SolrCloud, however it is recommended to have at least 3 instances. Why three and not two? Zookeeper requires a quorum to be considered up and running, so if you have two instances and one goes down, one instance up and one down would not be a quorum. If you have three instances, then if one goes down you will still have two out of three running and it will still be a quorum and thus keep running.

Let's get started.

Create a directory on the server named "solrcloud". We'll refer to this as <BASE_INSTALL_DIR>.

First, download Zookeeper from the Apache project website at http://zookeeper.apache.org/releases.html. At the time of this writing SolrCloud uses version 3.4.6.

Once the distribution is downloaded, unzip/untar to <BASE_INSTALL_DIR >. A folder named "zookeeper-3.4.6" will be extracted. We'll refer to this as <ZOOKEEPER_HOME> from now on.

We'll be creating three ZooKeeper instances, each will need a data directory. In the <BASE_INSTALL_DIR> create a directory named "zdata". Under the zdata directory, we'll create a directory for each instance named "1", "2", and "3". In a production environment, each ZooKeeper instance would be on a different server so you would just create a data directory in the location on your server that makes sense for you, however to keep things simple we'll be creating three instances on one machine and thus we need to create all three data directories on the same machine.

Within each of the data directories, a file named "myid" must be created. The only piece that needs to go in the "myid" file is the instance name. For now, we'll just use "1", "2", and "3" as the id's for the ZooKeeper instances. So add a "myid" file to each of the instance data directories and add an id to each.

Your directory structure should look like the following:

With Zookeeper on a single machine, it is not necessary to create multiple directories in order to have multiple instances. You just need to create a ZooKeeper configuration for each instance which ends with the id of the instance. To create the configuration files, go to <ZOOKEEPER_HOME>/conf. Copy zoo_sample.cfg and name the new configuration file zoo.cfg for instance 1, zoo2.cfg for instance 2, and zoo3.cfg for instance 3. Open the config files and update the "clientPort" property in each file. For the purposes of this guide, we'll increment the port number by one for each, but in production they will be on different servers so you can either leave the default port "2181" or you can change it to the port you wish to use. You will also need to configure the ports that the ZooKeeper instances communicate with each other on.

The configuration file for instance 1 should look similar to the following:

tickTime=2000

initLimit=10

syncLimit=5

dataDir=<BASE_INSTALL_DIR>/zdata/1

clientPort=2181

server.1=localhost:2888:3888

server.2=localhost:2889:3889

server.3=localhost:2890:3890

The only differences in the configuration files for the other two Zookeeper instances should be the "clientPort" values and the "dataDir" values which should be set to "2182" for instance 2 and "2183" for instance 3 and the data directories we specified previously for each instance.

Your ZooKeeper install should contain the configuration files as shown below.

Now you are ready to start your ZooKeeper Ensemble, but before we do that, let's create a helper script to start them all without having to type the startup command for each instance every time.

In your <BASE_INSTALL_DIR>, create a file named "startZookeeper.sh". In the file add the following:

#!/bin/sh

cd ./zookeeper-3.4.6

bin/zkServer.sh start zoo.cfg

bin/zkServer.sh start zoo2.cfg

bin/zkServer.sh start zoo3.cfg

Ensure you give the script execute permission, then run it from the command line. You should see output similar to the following:

JMX enabled by default

Using config: <BASE_INSTALL_DIR>/zookeeper-3.4.6/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

JMX enabled by default

Using config: <BASE_INSTALL_DIR>/zookeeper-3.4.6/bin/../conf/zoo2.cfg

Starting zookeeper ... STARTED

JMX enabled by default

Using config: <BASE_INSTALL_DIR>/zookeeper-3.4.6/bin/../conf/zoo3.cfg

Starting zookeeper ... STARTED

Creating a configset

Before we create the Solr instances, we'll need to create a configset in order to create a collection to shard and replicate across multiple instances. Creating a configset is very specific to your own collection, so it is out of the scope of this guide to create a configset, however I will add a couple of pointers.

If you use one of the pre-built configsets that come with Solr 5, they are located in solr-5.2.1/server/solr/configsets and you don't have to do anything. However, if you do roll your own, keep in mind the following:

Any paths referenced in your solrconfig.xml must be updated to reflect paths relative to your Solr instance directories (which we will create in a later section).
If you have a need for additional jar files such as jdbc drivers, you can add a "lib" directory inside your solr instance collection specific directories and they will automatically be picked up by Solr, so you do not have to modify solrconfig.xml in order to use them.

Note: The collection directories will be created by Solr once you create your collection, so you will have to add the lib directory and jars once you have completed "Adding a collection" section later in this guide. The directory will be at <BASE_INSTALL_DIR>/solr-5.2.1/server/<instance>/<collection>. Ex. <BASE_INSTALL_DIR>/solr-5.2.1/server/solr/mycollection_shard1_replica1. Restart the Solr instances once the jars are in place.

Ensure the appropriate <lib> tags are added to solrconfig.xml for any libraries you need in addition to the ones that may already be there. For example, to use the DataImportHandler, you need to add the following lines if they don't already exist:

<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler-extras/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />

Create/update the schema.xml as necessary to map data from the source to a Solr document.

Uploading a configset to Zookeeper

Note: This section is only relevant if you want to upload your configuration ahead of time instead of specifying the configuration to use in the "create" command used in the "Adding a Collection" section or if you are using the Collections API to issue a "create" command via the REST interface. When doing the creation of a collection via the REST interface you cannot specify a configset directory like you can using the solr script from the command line. Feel free to skip this section unless you plan on using the Collections API instead of the bin/solr script to create a collection.

In order for a configset to be used in SolrCloud, it needs to reside within Zookeeper. Zookeeper uses this configuration to automatically propagate configuration to the Solr instances and create your collection on each instance.

To upload the configset, you will need to use zkcli.sh which is in <BASE_INSTALL_DIR>/solr-5.2.1/server/scripts/cloud-scripts. So go to that directory and issue the following command:

./zkcli.sh -zkhost localhost:2181,localhost:2182,localhost:2183 -cmd upconfig -confname <your conf name> -confdir <BASE_INSTALL_DIR>/solr-5.2.1/sever/solr/configsets/<your conf dir>/conf

The above assumes you have put your configset in the configsets directory, however it doesn't have to be there. Also, in a production system, you won't be using localhost and the ports may be different, but you'll just need to update the host and ports as necessary for your environment.

After running the command, your configset should be uploaded to Zookeeper, but we don't have a Solr instance up and running yet so we won't be able to check it via the web interface quite yet.

Creating Solr Instances

In a production environment each instance will be on a separate server, so just like the Zookeeper instances they will likely have the same port, but different hosts. However, for the purposes of this guide, we will create three separate instances on the same machine. Luckily, this is very easy to do, but not quite as easy as Zookeeper which allows you to just add additional configuration files, but it is almost as easy. All you need to do is create additional directories that will serve as Solr home directories for each of the instances. The current Solr home is <BASE_INSTALL_DIR>/solr-5.2.1/server/solr. So we'll just add 3 additional directories so we can have a total of 4 instances. You can add as many as necessary, but we'll only add 3 here so that we can demonstrate sharding and replication across four instances.

Under <BASE_INSTALL_DIR>/solr-5.2.1/server add a directory named "solr2", "solr3", and "solr4" to represent our additional instances. Copy solr.xml for the original solr home directory and place it into each of the newly created directories. Then open each file and update the ports. Updating the ports is only necessary because we are on the same instance and we can't have multiple instances running on the same port. Use the following port numbers for the purposes of this guide:

Instance 1: 8983

Instance 2: 8984

Instance 3: 8985

Instance 4: 8986

Your directory structure with the new instances should look similar to the following:

That's all you need to do to create additional Solr instances. Simple right? Of course they don't do much now since they have no collections configured, but that's what we'll do in a minute. However, first we need to start up our Solr instances.

Starting Solr Instances

In order to start a Solr instance as part of the cloud and connected with the Zookeeper ensemble, issue the following commands from <BASE_INSTALL_DIR>/solr-5.2.1.

bin/solr start -cloud -s server/solr -p 8983 -z localhost:2181,localhost:2182,localhost:2183 -noprompt
bin/solr start -cloud -s server/solr2 -p 8984 -z localhost:2181,localhost:2182,localhost:2183 -noprompt
bin/solr start -cloud -s server/solr3 -p 8985 -z localhost:2181,localhost:2182,localhost:2183 -noprompt
bin/solr start -cloud -s server/solr4 -p 8986 -z localhost:2181,localhost:2182,localhost:2183 -noprompt

As with the Zookeeper instances, you can create a script that contains these commands as well so that you don't have to type them one by one each time you want to start your instances. Name it something like "startSolr.sh" and put it in the <BASE_INSTALL_DIR> and make sure you give it execute permission.

#!/bin/sh

cd solr-5.2.1

bin/solr start -cloud -s server/solr -p 8983 -z localhost:2181,localhost:2182,localhost:2183 -noprompt

bin/solr start -cloud -s server/solr2 -p 8984 -z localhost:2181,localhost:2182,localhost:2183 -noprompt

bin/solr start -cloud -s server/solr3 -p 8985 -z localhost:2181,localhost:2182,localhost:2183 -noprompt

bin/solr start -cloud -s server/solr4 -p 8986 -z localhost:2181,localhost:2182,localhost:2183 -noprompt

Upon successful execution of the startup commands you should see the following output:

Waiting to see Solr listening on port 8983 [/]

Started Solr server on port 8983 (pid=37286). Happy searching!

Waiting to see Solr listening on port 8984 [/]

Started Solr server on port 8984 (pid=37386). Happy searching!

Waiting to see Solr listening on port 8985 [/]

Started Solr server on port 8985 (pid=37489). Happy searching!

Waiting to see Solr listening on port 8986 [/]

Started Solr server on port 8986 (pid=37591). Happy searching!

Once the instances are up, you can open a web browser and go to the Solr web pages at:

Adding a Collection

First let's verify the configuration we uploaded earlier for our collection is in Zookeeper, otherwise we won't be able to create the collection. So, fire up a browser and go to http://localhost:8983/solr.

Navigate to the "Cloud" tab and open the "Tree" tab underneath it. You should see a tree containing the files in your Zookeeper ensemble. Within that set of files is a directory named "configs". Open that up and you should see your configuration there.

Now that you have a running Zookeeper ensemble along with four Solr instances, we can easily add your custom Solr collection. In order to do this, we'll use the solr utility in <BASE_INSTALL_DIR>/solr-5.2.1/bin. You could also use the CollectionsAPI directly and issue commands via the REST interface running on your Solr instances. See https://cwiki.apache.org/confluence/display/solr/Collections+API for more details on the collections API. Note that when using the Collections API to issue a "create" command, the configuration will already need to be in Zookeeper. Please refer to the "Uploading a configset to Zookeeper section above on how to upload your configset.

Issue the following command to create your collection:

bin/solr create -c <collection name> -d <config directory> -n <config name> -p 8983 -s 2 -rf 2

What the above command does is it creates a collection with the name you specify in the -c argument. This can be anything you want to name your collection. The -d argument specifies the config directory where your configset resides. This looks in <BASE_INSTALL_DIR>/solr-5.2.1/server/solr/configsets for the directory name you specify. The command will automatically add the config to Zookeeper. The -n argument specifies the name you wish to give this configuration in Zookeeper. Name it something meaningful so you can find it in the Solr admin console later on. The -p option specifies the port of the Solr instance you are creating this collection on. Since you are using Zookeeper, even though you specify only one of the Solr instances, the collection will be propagated as necessary to the other instances. The -s specifies the number of shards and the -rf parameter specifies the replication factor or how many copies of the shards you want. Since this example specifies two shards and two replicas, Zookeeper and Solr will automatically create a primary/leader shard for each shard on a separate instance and a replica of each of those shards on other instances using the four instances we configured without us having to do any work other than creating the collection on one of the Solr instances.

If everything worked, you should see a new directory within each of your solr instance directories with the name of your config followed by shard and replica labels.

<collection_name>_shard1_replica1
<collection_name>_shard1_replica2
<collection_name>_shard2_replica1
<collection_name>_shard2_replica2

The instance that each of these appear in may be different for each installation since we had all of the Solr instances up before we created the collection. If you want to, you can forego starting all of the Solr instances at once and only bring one up to start with. If you do this, the first shard will be put on the running instance. Then you can start the next server and the other half of the first shard will be put on the new instance. You keep repeating this until they are all up and you will end up with specific portions of the shards on specific servers as well as the replicas.

To view your SolrCloud go to http://localhost:8983/solr/#/~cloud which will show a diagram of all the instances in your cloud.

Stopping Zookeeper and Solr Instances

Stopping the Solr instances is very easy. Just issue the following command from the <BASE_INSTALL_DIR>/solr-5.2.1 directory:

bin/solr stop -all

If you want to stop a particular instance remove the "-all" argument and supply the "-p" argument and specify the port of the instance you want to stop.

bin/solr stop -p 8984

Stopping Zookeeper instances is also very easy. From the <BASE_INSTALL_DIR>/zookeeper-3.4.6 directory run the following command:

bin/zkServer.sh stop zoo.cfg

Replace zoo.cfg with the appropriate instance configuration as necessary. i.e. zoo2.cfg or zoo3.cfg

Hopefully this guide was helpful to you. That's all for now!

Thursday, May 14, 2015

Issue with ins_ctx.mk during Oracle 11g install on CentOS 7

Installing Oracle 11g Enterprise on CentOS 7 didn't go quite as smoothly as planned. However, by combining knowledge across several articles I was finally able to make it work.

During the install I received an error that wasn't mentioned in the Oracle install procedures or the article I used to guide the install at "Link 1" below. The error was when the installer was trying to call a target in "ins_ctx.mk". The message was:

INFO: /lib64/libstdc++.so.5: undefined reference to `memcpy@GLIBC_2.14'"

The solution essentially involved installing glibc-static and making the necessary updates to the ins_ctx.mk file. See "Link 3" below for details on how to resolve the error.

To install Oracle 11g Enterprise, first follow the steps in the post at "Link 1" below, but you will likely encounter the error described above during the "Link Binaries" phase of the install. If you do, then follow the steps in "Link 3" to resolve the issue with "ins_ctx.mk".

Link 1: http://dbaora.com/install-oracle-11g-release-2-11-2-on-centos-linux-7/

Link 2: http://oracle-base.com/articles/11g/oracle-db-11gr2-installation-on-oracle-linux-7.php

Link 3: https://web.archive.org/web/20140927033722/http://www.habitualcoder.com/?p=248

Tuesday, May 12, 2015

Oracle Enterprise Manager 11g Installation Error - Listener is not up or database service is not registered with it.

While attempting to install Oracle Enterprise Manager for Oracle 11g during the last phase of the Oracle 11g installer, it encountered the following error.

After checking to ensure the listener was up and doing a tnsping to the instance via the cmd window, I was at a loss so proceeded to Google to try to find the answer. After some digging, it seemed to be a fairly common problem. However, none of the solutions worked for me. At least none of them individually. After piecing together several solutions that others posted I finally was able to get the installation to succeed. Hopefully this helps you if you are stuck in a similar situation.

Note that this installation is on a machine without a static ip or domain.

If you haven't done so already, make sure to add the <ORACLE_HOME>\bin directory to your path so that you can run the Oracle utilities without having to be within the <ORACLE_HOME>\bin directory itself.

Let's get started!

First, install the Microsoft Loopback Adapter. This will allow you to specify a dummy host/domain on the loopback ip. See the following Microsoft TechNet post for details:
https://social.technet.microsoft.com/Forums/windows/en-US/259c7ef2-3770-4212-8fca-c58936979851/how-to-install-microsoft-loopback-adapter

Once you have the loopback adapter created and have updated your hosts file, stop the Oracle listener via the command line using the command "LSNRCTL.EXE stop". Use the "Net Configuration Assistant" to remove the listener and add a new one with all the default values and the same name.

Next, you will need to update your listener.ora and tnsnames.ora files to set the host as the dummy host/domain you specified in the hosts file.

The listener.ora file located at <ORACLE_HOME>\network\admin will contain something similar to the following:

LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = <YOUR HOSTNAME>)(PORT = 1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)

The tnsnames.ora file located at <ORACLE_HOME>\network\admin will contain something similar to the following:

<DB_SID> =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = <YOUR HOSTNAME>)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = <GLOBAL DB NAME>)
)

)

Now you will need to start the listener back up again using the command "LSNRCTL.EXE start".

Ensure the listener is up by using "tnsping <db sid>".

Next run the command "emca -config dbcontrol db -repos recreate" as Administrator and follow the configuration prompts displayed. If it completes successfully it will also list the URL you need to go to in order to view the Enterprise Manager page.

Thursday, March 12, 2015

Parsing Java Source Files Using Reflection

Have you ever needed to parse a Java source file, but didn't want to write a parser for it? Well, you can by taking advantage of the Java compiler programmatically to compile the source files into class files then using a URLClassLoader load each class into memory and use reflection to get the information you need. Let's take a look at how this works.

First you need to get a collection of all of the Java source files you wish to compile so that you can pass it to the compiler.

File packageBaseDir = new File("path/to/the/base/dir/of/the/source/files");
List<File> sourceFiles = new ArrayList<>();

public void collectSourceFiles(File packageBaseDir, List<File> sourceFiles) {
    File[] filesInCurrDir = packageBaseDir.listFiles();

    for ( File file : filesInCurrDir ) {
        if ( file.isDirectory() ) {
            collectSourceFiles(file, sourceFiles);
        }
        else if ( file.getName().endsWith(".java") ) {
            sourceFiles.add(file);
        }
    }
}

Now that you have all of the source files, you need to access the Java compiler to compile them.

void compileSourceFiles(List<File> sourceFiles) {
        JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
        StandardJavaFileManager fileManager = compiler.getStandardFileManager(null, null, null);
        Iterable compilationUnits1 = fileManager.getJavaFileObjectsFromFiles(sourceFiles);
        JavaCompiler.CompilationTask task = compiler.getTask(null, fileManager, null, null, null, compilationUnits1);
        task.call();
}

The above code is accessing the Java compiler programmatically, using the StandardJavaFileManager to get the Java source as JavaFileObjects in order to pass to the compiler. A CompilationTask is created and then run on the source files. The source files should output to the same directory as the Java source. Now that the source is compiled, you can use a URLClassLoader to load the classes into your program.

1
2
3

URLClassLoader urlClassLoader = new URLClassLoader(
                    new URL[]{packageBaseDir.toURI().toURL()},
                    null);

Then you can simply load the class and start using the standard reflection methods on it.

1
2

Class clazz = urlClassLoader.loadClass(binaryClassName);
Methods[] methods = clazz.getDeclaredMethods(); // or whatever else you're interested in

Happy coding!

Wednesday, January 7, 2015

Setup UPS with Synology Disk Station and CentOS Linux Server via USB and Network

Setting up a UPS that will automatically cause a Synology DiskStation to enter safe mode and CentOS 7 server was fairly straightforward, however there was not a lot of information showing how to do this so I decided to write this post.

I will describe how to setup a UPS connected to a Synology Disk Station via USB which will notify a machine running CentOS and the upsmon service when UPS events occur so that they can respond to a power outage accordingly.

DiskStation Setup

Connect your UPS to your Synology NAS using a USB cable.

Using built in support for a UPS Network Server on the Synology NAS (which uses NUT from www.networkupstools.org under the covers) setup is very easy.

Login to your DiskStation and go to the Control Panel. Select the "Hardware and Power" icon and go to the "UPS" tab. You should see something similar to the screen shown below.

Select "Enable UPS Support" to enable communication with the UPS via the USB cable.

You can optionally set a period of time before the NAS enters Safe Mode or leave the default which will cause the NAS to enter Safe Mode when the UPS battery reaches a low status. Safe mode un-mounts all disks and stops all services to prevent data loss on your NAS.

Next, check the "Enable network UPS server" box. Then click "Permitted DiskStations". Even though it says "Permitted DiskStations", it will work with any machine running the NUT upsmon service. Once you click on the "Permitted DiskStations" button you will be presented with a form to fill out the IP's of the servers you want to notify when the NAS you're on receives UPS events.

Enter the IP of the server that you want to receive the UPS events and click "OK". Then "Apply" on the main UPS page.

Linux Server Setup (CentOS)

First you'll need to install nut via yum.

If you don't already have the epel repository in yum, you will need to install it.

yum install epel-release

Then you will need to install nut.

yum install nut

Once nut is installed you should have a nut user and group created by the installer.

Open /etc/ups/upsmon.conf. We will need to update the configuration to allow it to listen for events from the Synology server. Search for the "MONITOR" section. You will need to update or add a line that looks like the following:

MONITOR ups@<ip of synology server>:3493 1 <user> <pass> slave

To get the user and pass values, SSH to your Synology NAS. In the file located at /usr/syno/etc/ups/upsd.users it should specify the username and password. Use those values in the MONITOR line on the Linux server.

Also, be sure to look at the "SHUTDOWNCMD" in upsmon.conf and ensure it halts the system instead of shutting it completely down so that it will come up automatically after a power outage. This is the default in the file so you shouldn't have to change anything. In your Linux server machine BIOS you need to also ensure it is setup to automatically power on after the power is restored.

On your Linux machine you'll need to create a directory /var/run/nut. Change ownership of the directory to user nut and group nut.

chown nut:nut /var/run/nut

Modify /lib/systemd/system and remove nut-server.service from the nut-monitor.service file if you are not running a nut server on the linux box as this will prevent it from starting upsmon. Since this machine is setup as the slave, you probably won't be running a nut server so make sure you take the entry out.

Next you will need to add upsmon to startup when your server starts. Go to /etc/systemd/system. Create a symbolic link to the nut-monitor.service.

ln -s /lib/systemd/system/nut-monitor.service nut-monitor.service

Finally, run the following commands to enable the service to be run by systemctl.

systemctl daemon-reload

systemctl enable nut-monitor.service

That's it! When you experience a power outage your UPS should kick on, communicate with your DiskStation to enter safe mode, and the DiskStation should communicate with the Linux server to halt it and when the power comes back up the DiskStation and Linux Server should automatically start back up.