Saturday, July 25, 2009

ATA over Ethernet V.S. iSCSI

I read the rfc specification of ATA over Ethernet (AoE) and a bunch of related documents about AoE and iSCSI, and got the following feature comparisons:

1. Routability. AoE is built directly on top of Ethernet, and thus can only be used inside LAN or VLAN, not over Internet -- it is practically impossible to build a VLAN over Internet. iSCSI can be deployed across the Internet because it is built on IP, and the initiator and target communicate through TCP connections.

2. Performance. AoE is more light weighted and has less network and CPU overhead than iSCSI since it runs directly over Ethernet, but a test done by VMWare shows that they can both reach a throughput at the wire speed under proper configurations. See http://www.vmware.com/files/pdf/storage_protocol_perf.pdf and http://www.coraid.com/site/co-pdfs/AoE_Performance_Comparison_3.pdf.

3. Sharing of devices, or targets. AoE specification states AoE as a connectionless protocol, and provides mechanisms such as reserve/release command and config string to coordinate the concurrent access from different hosts, but this is not a real target sharing mechanism among multiple hosts. For example, if one target is reserved by one specific host and that host goes down, there will be no normal way for another host to come over and resume the use of the target device. The only way to deal with this type of fail-over is to find the administrator to force release the target so that it is available to other hosts. Commercial products usually rely on higher level shared disk file systems to coordinate the access from multiple clients and handle node failures (see http://www.sourceteksystems.com/Uploads/AoE_Tutorial.pdf).On the contrary, iSCSI specification explicitly supports sharing of both devices and targets. That is, you can either export one device as multiple targets, or share one target among multiple initiators. So sharing of volumes is easier with iSCSI. For read-only sharing, we can simply have multiple instances attached to the same volume, either by creating multiple targets on the same physical or logical volume, or by connecting multiple initiators to a single device target. And in case of failure of one attached instance, we can directly have a backup instance come over and resume the use of the volume.

Friday, July 24, 2009

test version of VbsNimbusService deployed

We have got a basically working version of VBS integrated with Nimbus,
and I have deployed a test version of the integrated web service,
VbsNimbusService, to
http://cglc.uits.iu.edu:8080/axis2/services/VbsNimbusService?wsdl.
Some basic features include:

1) It is deployed under axis2, not globus; without any security
mechanisms or user management yet. So you can actually call it freely
with EPR http://cglc.uits.iu.edu:8080/axis2/services/VbsNimbusService/
.

2) It provides the same interface as EBS with SOAP binding. Note that
it accepts Nimbus workspace instance id numbers, such as 39 or 44, not
the handle names returned by the Nimbus cloud client.

3) The "progress" property of snapshot creation and "force" argument
of volume detach operation are not implemented yet. The snapshot
creation process is usually quick so you should be able to see the
status of a snapshot updated to "created" soon after the
createSnapshot operation returns. The "availabilityZone" property of a volume is meaningless in current VBS implementation, but we are keeping the information in the metadata database of the volumes.

4) For volumes newly created based on snapshots, there are 4 possible status: available, in-use, pending, and failed. "Available" means that the volume has been created successfully, and is available for attachment. "In-use" means that the volume is currently attached to some VM instance. "pending" means that the new volume is just being created by copying data from a given snapshot, and "failed" means that the volume has been created but the system didn't successfully copy data from the given snapshot to it. Note that in "pending" and "failed" status, the volume is actually also available for attachment; the status names are just an indication of information related to the snapshot-copy operation. It is not recommended to attach a "pending" volume to any VM instance, because operations from the instance may corrupt the copy process and lead to incorrect information written or read.

Two problems when writing the VbsNimbusService

Man, globus is not easy to use...

I encountered these two problems when writing the VbsNimbusService web service which integrates VBS with Nimbus:

1. We are trying to deserialize a workspace epr file in the execution of the service, but it reports a "no deserializer for {http://schemas.xmlsoap.org/ws/2004/03/addressing}AttributedURI" type of error when it comes to the XML type http://schemas.xmlsoap.org/ws/2004/03/addressing:AttributedURI. The problem is that the service found no type-mapping defined for this type. I did a lot of search and finally found a tip here: http://www.globus.org/toolkit/docs/4.0/common/javawscore/developer-index.html. There is the following notes about Java WS Core based client development:

"Any program that is based on Java WS Core should contain as a first entry in its classpath the directory of the Java WS Core installation. This is to ensure that the right client-config.wsdd is used by the client. That configuration file contains important client-side information such as handlers, type mappings, etc. "

That's it. There is a client-config.wsdd file under $GLOBUS_LOCATION which specifies deserialization classes for certain XML types. When you execute some clients deployed in globus, e.g., the Nimbus workspace client, their scripts make sure that the commands executing the clients' codes include $GLOBUS_LOCATION at the beginning of the CLASSPATH. But when you execute your own Java WS Core based clients, you need to add the directory containing that .wsdd file to the beginning of your CLASSPATH by yourself. And if that client is a web service itself, you will need to do this in the scripts that start your services' container. For example, if you deploy your service in tomcat, you will need to modify the catalina.sh file to specify the correct CLASSPATH.

2. The VbsNimbusService reports "Unkown CA" fault when doing authentication to the certificate of its own. VbsNimbusService uses a NimbusRPQueryClient to interact with the Nimbus workspace service. Like the Nimbus workspace client, NimbusRPQueryClient extends the org.globus.workspace.client_common.BaseClient class and thus also uses the credential got from GlobusCredential.getDefaultCredential() for authentication and authorization to the workspace service. The problem here is the trusted CA directory. For clients deployed in globus, the scripts running them specifies the trusted CA directory as the $X509_CERT_DIR environment variable, which contains the right trusted CAs. But when you are running your own Nimbus client without a globus container environment, the client's code will load the trusted CAs from some default directory. This "default directory" is just my guess, and I never even made it printed out so that I could copy the trusted CAs into it. Fortunately, I found the bug report here http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=3843 and followed the instructions to set the trusted CA directory dynamically in the source codes of NimbusRPQueryClient, and finally made the VbsNimbusService work.

The instructions in the bug report above are not totally accurate. What we should actually do is to rewrite the setOptions(Stub stub) function in the class which extends org.globus.workspace.client_common.BaseClient (or org.globus.wsrf.client.BaseClient if you are not writing a Nimbus client), and add the following codes:

TrustedCertificates trustedCerts = TrustedCertificates.load("path to trusted certificates");
stub._setProperty(GSIConstants.TRUSTED_CERTIFICATES, trustedCerts);
super.setOptions(stub);

With this modification, your client should be able to set the right trusted CA directory without affecting the original execution of the BaseClient.

Tuesday, July 7, 2009

Three problems with the Nimbus testbed on cglc

We met three problems when trying to make Nimbus work on cglc.

1. The ethernet interface id on cglc5 starts from eth1 in stead of eth0. So the default network script setting of Xen will cause some strange effects on the network layout: no "peth"s, xenbr0 having no interfaces attached, etc. The solution is replace "(network-script network-bridge)" with "(network-script 'network-bridge netdev=eth1')" in the file /etc/xen/xend-config.sxp. cglc6 to cglc8 also have the same ethernet interface id sequences, and thus facing the same problem as cglc5. If any networking problems come up in the future on these machines, we should remember checking this configuration.

2. Ebtables of version numbers later than 2.0.6 have some bugs with Xen x86_64. Since ebtables are used by Nimbus, version numbers such as 2.0.8 or 2.0.9 will cause some exceptions with the following messages when new Nimbus workspaces are created:

"The kernel doesn't support a certain ebtables extension, consider recompiling your kernel or insmod the extension."

Accompanied by this dmesg error: "kernel msg: ebtables bug: please report to author: entries_size too small"

The solution is to build ebtables 2.0.6 on VMMs in Nimbus. A Nimbus user, Matt, has solved this before, and provided a rpm to the Nimbus site:

http://workspace.globus.org/downloads/ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm
http://workspace.globus.org/downloads/ebtables-2.0.6-3.rf-mm-el5.src.rpm

sha1sums:
45982dacfaddfe8f828a720c94d7435a65a4bbc2 ebtables-2.0.6-3.rf-mm-el5-x86_64.rpm
df61bb2a43faa4fdc8e592c04b65242c71db790f ebtables-2.0.6-3.rf-mm-el5.src.rpm

And if you really want to build it manually, follow the steps bellow (this instruction is provided by Luna from the Nimbus user mail-group):
1) Down load the source codes of ebtables 2.0.6 from https://launchpad.net/ubuntu/dapper/+source/ebtables/2.0.6-3ubuntu2/+files/ebtables_2.0.6.orig.tar.gz;
2) Untar the source codes;
3) Down load the patches on http://svn.exactcode.de/t2/trunk/package/security/ebtables/, and apply them to the source codes;
4) Back up your previous ebtable installation (this is optional, just in case if you might need to use it in the future);
5) Do make and make install in the source directory of ebtables 2.0.6.

3. The default network association in worksp.conf provided by the Nimbus site is "association_0: private; xenbr0; vif0.0 ; none; 192.168.0.0/24", but this caused an exception on our testbed when I tried to deploy a new workspace: Nimbus is looking for an association named "public" instead of "private" when trying to configure the new virtual machine. So the solution is to replace "private" with "public" in this line. I don't know if this is general problem with the document on the Nimbus site http://workspace.globus.org/vm/TP2.2/admin/quickstart.html, but it works with our testbed. Maybe it's just because of some specific configuration on clgc5. Of course, you can always check the logs under /opt/workspace/logs to investigate the problems with workspace creation and termination.