Sunday, December 30, 2007

Images and Mysterious Gaps

While rewriting a page on Sepia Mutiny, I found yet another reason why being a site designer is a highly-paid full-time job. It's impossible for anyone else to keep up with browser standards and implementations.

Consider the simple HTML below

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Sepia Mutiny</title>
</head>
<body>
<div style="border: 1px solid black; background-color: red;">
<img style="" src="/sepia/images/SMB3.jpg"/></div>
</body>
</html>

This is a well-formed strict HTML document that validates. You'd be surprised to see what it produces. See below.



If you haven't realized what's wrong, there is a gap below the banner image, before the div tag ends. Why? Probably stupid IE doing it's own thing, is anybody's first guess. Actually this is Firefox 2. Huh? I tried setting margin, padding and any other property that I could think of to 0. No effect. What gives? The same document renders differently in Internet Explorer 7. No gap below the image. Weird. Upon further tests, I made another HTML document.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Sepia Mutiny</title>
</head>
<body>
<div style="border: 1px solid black; background-color: red;">
<img style="" src="/sepia/images/SMB3.jpg"/></div>
</body>
</html>

What's the difference? Only the Doctype is missing. So of course this is not a valid strict HTML document. Here's how it renders in Firefox?



I spent scratching my head for the better part of this sunday over this. Is Firefox buggy and IE7 getting things right now? Traditionally, sites are designed for non-IE browsers first because they are standards-compliant, and code exceptions/hacks for IE because of all its quirks. Has the situation reversed? After much Googling, I finally found the story behind this mysterious gap. Apparently, Firefox is too good at being standards compliant.

Lesson: Because the html element <img> is an inline element by default, it is rendered with a baseline whose height from the bottom border of its container depends on the font applied to the container. The element's property must be set to block to render the element differently without this baseline (assuming there is no other element such as inline text that may need a baseline within the same container). Thus, re-writing legacy HTML to conform to today's standards will break a template design. This is because a well-formed standards compliant HTML document is rendered by today's A-grade browsers in 'standards mode', whereas badly written html documents of yesterday are rendered by browsers in 'quirks mode'. As some of us who don't have enough background in designing sites usually just wing it, this is going to be a problem because of the all the bad habits we've learned over the years working in browsers' quirks mode. Now doing it the standards way is rather hard because we have to re-learn or rather learn correctly html and css standards.

References:
  1. Eric A. Meyer, Images, Tables, and Mysterious Gaps. Mar 21, 2003.
  2. Eric A. Meyer, Images, Tables, and Mysterious Gaps. Mar 3, 2002.

Thursday, December 13, 2007

perl xml parser and dependency hell

Was trying to install a perl module from CPAN (Frontier::Daemon) which needed XML::Parser which just wouldn't install. Perl's package manager kept complaining about a missing expat.h file, followed by many lines of errors. yum said expat is up to date. Removing expat (in an effort to reinstall it) removed yum as well. Installing yum wasn't easy. Finally got the rpm for yum to work and installed CentOS's precompiled perl xml parser. perl was satisfied with this and installed Frontier::Daemon without further complaints.

Tuesday, December 11, 2007

sendmail Doesn't Listen

sendmail service on CentOS default installation listens only on the loopback interface, for obvious security reasons. To modify this to listen on all binding ip address change the line

DAEMON_OPTIONS(`Port=smtp,Addr=127.0.0.1, Name=MTA')dnl

to

dnl DAEMON_OPTIONS(`Port=smtp,Addr=127.0.0.1, Name=MTA')dnl

As usual with any configuration change, recompile and restart service

> m4 /etc/mail/sendmail.mc > /etc/mail/sendmail.cf
> /sbin/service sendmail restart

Ensure sendmail.cf has rw-r--r-- permissions only, or it will complain about "dangerous write permissions".

When Yum Corrupts Its Own rpm Database

I was using yum the other to update a CentOS 5 server after a long time. It got stuck on a long download and didn't seem to quit even after a long time. Tried ending the process using

> kill pid
> kill -INT pid

This did nothing and the process was still hung. Finally used the trusty but dangerous kill -9 which terminated the process. But this made matters worse. Now yum wouldn't even start. A quick google search led to someone speculating that such forceful killing of the process could have corrupted the rpm database. Per their suggestion,

> rm /var/lib/rpm/__db*
> rpm --rebuilddb
> yum clean all

This fixed everything. Not sure what the above does exactly, but hopefully it's ok. This was not a production box, so it was fine. Yum works!

Tuesday, September 18, 2007

Automatic Logon in Intranet Zone

Internet Explorer has this feature where it automatically tries to logon to websites requiring authentication, using local windows credentials, when the website is within the Intranet zone. This occasionally fails and going to the website prompts for a username password, even though it is the same as the windows logon password.

This is the case for Windows Sharepoint Services websites setup within the local intranet that use the local Active Directory for authentication. This authentication can happen seamlessly if using a computer that is authenticated with a valid user from the same Active Directory.

When IE's automatic logon feature fails sometimes, the password for that website needs to be reset / removed. This can be done by opening the 'Stored User Name and Passwords' dialog using the Run command "control keymgr.dll" and deleting the record corresponding to the website. If necessary, add the website to the 'Trusted Sites' and change the security settings for that zone to 'Automatically logon with current username and password'. Obviously if there are other websites in the same zone, this is a security risk. Further troubleshooting may be necessary to fix the problem without adding the local website to 'Trusted Sites'.

Wednesday, August 01, 2007

McProxy.exe

Internet Explorer stopped working on this machine. Opening any website would show the status bar message 'Website Found. Waiting for reply...', but the page never loads. Network access was fine. DNS was working. An alternate browser was working on the same machine. Clearly some module of IE was causing this problem.

Used SysInternals' Process Explorer for Windows to terminate different processes and finally found the bad program. McAfee AntiVirus Suite's McProxy.exe is supposed to filter web pages for bad content and pass onto browser. This process became corrupt for whatever reason. Removed it from startup and IE works fine.

Tuesday, July 31, 2007

SELinux

I was struggling to install MT4 on a fresh CentOS installation. Building a LAMP Server was a decent guide, but I got stuck at setting up MT4. The 'mt-wizard.cgi' could not find it's own 'mt-static' directory. But the directory would navigate just fine from a browser client. Was MT4-RC1 broken?

I went to the MT::App:Wizard Perl module and edited the function which was testing for {mt-dir}/mt-static/styles.css file, which is how MT determines the location of 'mt-static' folder within its own installation. Adding simple print statements gave the state of the LWP::UserAgent variable.

print STDERR $response->status_line

This output (to the Apache error log of this VirtualHost) an error message similar to "500 Connection refused to {servername}:80. Permission denied". I tried writing my own test perl scripts using LWP::UserAgent and LWP::Simple to verify if either MT4 or Perl themselves weren't broken within the environment.

Finally found /var/log/messages log file that recorded each attempt and access denied. The log file was very helpful. Learnt about SELinux's role over Apache as a security measure. So I tried modifying the local policy as described, but it didn't work. So finally had to change SELinux mode from enforcing to permissive in it's config file at /etc/selinux/config.

SELINUX=permissive

Have to learn more about SELinux and it's role in Linux, Apache, etc.

Wednesday, February 07, 2007

Unable To View Email Attachments

Had this problem a while back. Documenting now ...

Problem: User would complain that he is unable to open attachments in Lotus Notes. On examining, the attachments are simply picture icons and not actual attachments.

Analysis: This was affecting Lotus Notes 6.x versions and is documented here. Basically, Lotus Notes creates a temporary file on the sender's machine when processing attachments and is supposed to delete this particular temporary file after completion. If for whatever reason, the file system locks this file, and therefore cannot be deleted (e.g., when multiple instances of Lotus Notes are running or a previous instance quits abruptly), a email message cannot process attachments. Instead of showing an error message to the sender, Lotus Notes sends the email message anyway, with picture icons of attachments instead of the actual files. This happens on the sender's machine and the recipient cannot do anything once the mail is already sent.

Resolution: Sender must quit Lotus Notes and delete all temporary files created by Notes.

Monday, February 05, 2007

It's All About Power

Learnt a tough lesson today, that I wasn't able to figure out for months. Goes to show that despite all the lessons you learn from books, lessons from mistakes are the best teachers.

Problem: This important network server has a unique reboot problem. While it was running, it would function absolutely fine. The moment I had to restart it for maintenance, it would become a pain. It just would not power back up. Fans worked, Hard drives spinned, but no POST beep or display at all with even the BIOS message.

Diagnostic Analysis: The problem started several months ago (over 6 months may be even a year) when it refused to start after a reboot. It was an old Gateway server with PIII bought in 2001. I thought it was dying, time to replace it. But of course, budget didn't accomodate it. Ok fine, let's stick to the technical stuff. The power connection seemed fine, cpu fan was working, hard drives were spinning, but no post beep, no bios display on screen, nothing. Stumped. Called Gateway, and they were no help. Since it was out of warranty all I got was some phone support and they said that it was a problem with the motherboard. Which I figured as well. In a desperate attempt, I unplugged everything, memory, cables, power and reseated everything again. I did this a couple of times, thinking it was just a matter of a bad connection somewhere. After countless attempts it magically booted. I thanked my stars and thought I fixed the problem. This was several months ago.

Around October 2006 or so, I had to restart the server and did it apprehensively. Same problem. It just won't boot back up. Again I took its guts apart and put it back together several times. The VGA connection seemed a bit less than snug. I had no idea why that could cause a problem, but I held it hard against its socket and powered up. Seemed to do the trick after several tries. Again I thought I figured out the problem. The VGA adapter connection on the motherboard has gone bad. So I ordered a new motherboard (~$200) to replace it, and waited until when I next had to restart it again.

Fast forward to today, several updates were due on this server and I really had to restart it. Bad idea. Just would not boot back. For the first time (I think), I noticed a blinking amber light on the front panel. Googled it and learned that a blinking amber light possibly meant a power problem.

[External Link]

Although, power light codes (or system beeps) may differ from vendor to vendor, there is some uniformity. A solid amber light meant that the computer was receiving power, but there may be an internal power problem. I considered this may be the real problem and replaced the motherboard I had. I think this is the first time I replaced a motherboard by myself. Removing the heat sink and then the processor was most worrisome. Anyway, I digress. Put the connections back exactly as on the previous motherboard. Prayed and powered up. No POST beep. Darn! Called Gateway again hoping they'd give me some checklist about replacing the motherboard. No, I did everything right. They thought that the new motherboard was DOA and I should call the vendor for tech support. He had no idea about what the blinking amber light could mean. Useless guy. Typical response of blaming the other guy. What did I expect.

I was about to give up when I reconsidered the power light code. Blinking amber light meant that a device might be malfunctioning or not correctly installed. Not necessarily an internal motherboard problem (which clearly had to be ruled out after replacing it). Looked at all the devices I had. I had removed all cards from the motherboard that didn't need to be there. And then I realized an old lesson I learnt and, for whatever stupid reason, had been overlooking all this time.

I had installed an extra hard drive a long time ago (probably about the time the problem started) when the server was running low on disk space. I had no tools to ensure the power supply was enough to support this extra device. This is a very basic thing to check when you are building a computer from scratch. After you choose a motherboard of your choice, you find a power supply that is more than sufficient to support all the devices. I learnt this specifically when studying for CompTIA A+ exam. This sytem came pre-built from Gateway, and most desktops in my experience came with enough supply to support hard drive upgrades. For whatever cheap reason, Gateway probably decided that they were going to use a power supply that was just sufficient. I removed the extra hard drive and presto, problem solved. It boots up immediately. Phew!

Side note: A side-effect of replacing motherboard is CMOS gets reset. The server which was a Domain Controller was unable to login into its own network and gave the error message that the time was different from network time. Kerberos authentication prevents logons from a computer that has a time difference of more than 6 minutes (or so). Being a domain controller I cannot even logon locally to reset the time. Had to restart to set the date time in BIOS. Wasn't sure if I should. After all I was wrong twice. Oh well, took the risk, cause this time I knew the solution without guessing and was pretty sure I fixed the problem. And yes, it worked. Restarted just fine. Set the correct date time in BIOS. And now I have a working network server again. :)

Moral Of The Story: Sometimes the simplest of answers to the toughest puzzles lies in the assumptions you make by default.