1 of 75

Operation and Analysis of Coldspark Products

Q3 2007

2 of 75

Analysis of

Systems and Behavior

3 of 75

System and Behavioral Analysis

  • Problem solving is a mindset, not a finite set of solutions.
  • This mindset consists of:
    • Knowing how to hone in on relevant info quickly
    • Recognizing patterns
    • Attaining a depth of understanding
    • Striving toward a goal

4 of 75

How to Assess a Situation

  • Identify:
    • symptoms that are or are not related to the problem at hand
    • chain of causality, e.g. primary and secondary causes.
  • Understand the relationship between observations and the problem.
    • Correlation is not causation

5 of 75

How to Assess a Situation

  • Identify what is unique about this situation
    • What isn’t unique about this situation? What potential causes does that rule out?
  • Why haven’t I seen this problem before?
  • Test to:
    • observe system behavior
    • reproduce the problem

6 of 75

Example Scenario

  • A customer reports that their production SparkEngine is logging a PMT breach, and they are unable to inject new messages.
    • A PMT breach is not a primary cause. Something causes the memory to be consumed.
    • Once the SparkEngine hits PMT, it rejects new injection attempts until memory usage comes back down- so it is a secondary cause.
    • Now, we have to figure out what is consuming the SparkEngine’s memory.

7 of 75

Example Scenario

  • There are a few legitimate cause of sustained PMT breaches such as large queues or large template cache.
    • Ask the question internally- “What are the known causes of high memory utilization?”
  • Additionally, there are illegitimate causes of sustained high memory utilization, such as memory leaks.
    • Ask others about historical causes; also search Jira
  • Focus on the potential causes for which you can find evidence.

8 of 75

Example Scenario

  • Let’s say the queue contained 15,000 merge messages.
    • While this can cause high memory utilization, it still is not the primary cause.
    • What caused the queue backup? If you don’t know the possible causes, ask others “What are the known legitimate reasons for queue backups? What is the history of defects that cause illegitimate queue backups?”

9 of 75

Example Scenario

  • For example, a queue backup can be caused by domain throttling.
  • You can check the queue contents and throttle settings.
  • The result indicates that the queue has grown due to throttling. Now make sure the throttling is working as expected.
    • Queue must drain according to the SparkEngine settings.

10 of 75

Example Scenario

  • This scenario is very simple. However, the concepts apply to more complex scenarios.
    • Figure out what questions to ask
    • Determine the chain of causality
    • Take a logical approach
    • Discard information when it no longer applies.

11 of 75

Example Scenario

  • Now that we know what caused the problem, we can make recommendations. You might just tell the customer to be patient, and what for the queue to drain, or to ease up on their throttle settings.
  • However, always think about improving the customer experience. The SparkEngine may be “working as designed”, but the customer is left without means to accomplish the level of throttling and the message throughput they desire. Is there a configuration solution? Is there a product feature that would improve this scenario?

12 of 75

How to Assess A Situation

  • Remember:
    • Every problem has a cause.
    • Causes are not random.
    • All product problems can be recreated. It is a matter of time and priority, not possibility.

13 of 75

Attain a Depth of Understanding

  • Strive to understand what is going on.
  • Do not be content with confusion.
  • Ask questions, but also answer your own questions.
  • Look at the big picture. Do not get stuck at the gate.
  • Do not limit yourself to superficial understanding.

14 of 75

Attain a Depth of Understanding

  • Synergy of knowledge
    • Synergy “refers to the phenomenon in which two or more discrete influences or agents acting together create an effect greater than that predicted by knowing only the separate effects of the individual agents” (wikipedia).
  • Invest in knowledge acquisition over time

15 of 75

Attain a Depth of Understanding

  • Creating and contributing can deepen understanding.
  • Think outside of your job description.
  • Be idealistic
  • Don’t be afraid to “sound dumb”.

16 of 75

Why It is Important to Have a Depth of Understanding

  • Avoiding hack fixes
    • Usually implemented in desperation, or when a better solution is not yet known or thought possible.
    • Over time, hack fixes will hurt Coldspark overall.

17 of 75

Why It is Important to Have a Depth of Understanding

  • User Advocacy
    • Different organizations and roles at Coldspark have different priorities.
    • Professional Services must advocate the user, and persist through casual dismissal.
    • Help Coldspark stay on top of customer satisfaction.
    • Deep understanding of the products and customer use-cases is essential to adequately advocate the customer.

18 of 75

Reproducing Problems

19 of 75

Reproducing Problems

  • Set up an environment that captures some of the core elements of the customer scenario. An exact match is not necessary.
  • Create the type of load or traffic seen by the customer
    • Runtime behavior is different than static behavior.

20 of 75

Reproducing Problems

  • Try all sorts of combinations of configuration, traffic and environment, guiding rather than limiting yourself by what seem most likely.
  • Do a random walk.

21 of 75

Reproducing Problems

  • If you try for a long time and don’t get anywhere:
    • Be patient. This can be a very long process.
    • Search through Jira for defects of a similar nature, and use those steps of reproduction as ideas to seed your attempts.
    • Use tools. Create the right tools if they are not already available.
    • Brainstorm with other people in the company. Many people can help to point you in the right direction.
      • But, do not ask engineering to solve an undefined problem. Ops and QA must define the problem, and they are the most qualified to do so.

22 of 75

Efficient Use of Resources

  • People are the most valuable resource at Coldspark
  • Ask for help when you are stuck, but not prematurely.
    • Know your own capabilities; know the capability of others.
  • How to ask intelligent questions
    • Do the work to frame the question correctly
    • Do not ask others to do your job for you
    • Ask precise questions; know how to use the answer.

23 of 75

Efficient Use of Resources

  • People at Coldspark have deep knowledge because they have sought it out.
  • There isn’t a magic book with all the answers.
  • I’ve learned a lot of stuff through experimentation and research, often when I couldn’t get a satisfactory answer.
  • Nobody can simply fill your brain with all you need to know.

24 of 75

Patterns of Common Pitfalls

  • Environment/Configuration
  • Product Defects
  • Customer code defects
  • Usability defects
  • How to tell the difference between a configuration problem and a product problem.

25 of 75

Mitigating Risk

  • How to recognize imminent disasters
    • Many unanticipated and undiagnosed system behaviors
    • Large amounts of guesswork in proposed solutions
  • Preventing disasters
    • Build resilience into the system
    • Understand the weak points and possible risks
    • Substantiate intuition with facts and proof of potential problems

26 of 75

Striving Toward a Goal

  • The goal of our best intentions and efforts is success.
    • Success comes in many forms.
  • Use time effectively to optimize achievement of a goal or multiple goals.
  • Change strategies if best effort and intention is leading to failure.

27 of 75

System and Behavioral Analysis

  • Summary:
    • Know how to narrow in on relevant information quickly
    • Understand patterns of failure and success
    • Attain a depth of understanding
    • Strive toward a goal

28 of 75

Advanced Email Concepts

29 of 75

Advanced Email Concepts

  • Summary
    • TLS
    • MIME, character sets, content types, 8BITMIME
    • LDAP
    • DKIM, DNSBL, SPF, greylisting, tarpitting
    • Brief RFC Overview

30 of 75

TLS

  • TLS
    • PKI overview:
      • While Symmetric Crypto uses the same key to both encrypt and decrypt data, public key or asymmetric crypto uses a separate key to decrypt data than the one used to encrypt data.
      • The public key part may be safely exchanged, and may be in the form of a certificate, including various information about the organization to which it belongs.
      • Both encrypts and establishes trust

31 of 75

TLS

32 of 75

STARTTLS

      • PKI is used to encrypt the symmetric key that is actually used for the session.
      • STARTTLS is a ESMTP command. Once issued, TLS negotiation takes place.
      • Encrypts the traffic, and may establish trust of the client and server.
      • Not universally adopted
      • STARTTLS is not SMTP-specific. It is used to negotiate an encrypted session over a previously unencrypted channel.

33 of 75

MIME

  • Mime headers
  • Mime boundaries
  • Content definitions
  • RFCs:
    • Rfc2045
    • Rfc2047

MIME-version: 1.0

Content-type: multipart/mixed; boundary="frontier"

This is a message with multiple parts in MIME format.

--frontier

Content-type: text/plain

This is the body of the message.

--frontier

Content-type: application/octet-stream

Content-transfer-encoding: base64

PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+VGhpcyBpcyB0aGUg

Ym9keSBvZiB0aGUgbWVzc2FnZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg==

--frontier--

34 of 75

Character Encoding: Overview

  • A character encoding maps a character set to another set, such as octets, or decimal numbers.
  • There are many character set encodings, that use many different mapping techniques. This makes the whole thing very confusing.
  • Mishandling character encodings can result in corrupted messages.

35 of 75

Character Sets: Ascii

  • US-ASCII
    • Uses only 7 of the 8 bits
    • 33 non-printable, mostly obsolete control characters
    • 94 printable characters
    • Space character
    • A-Za-z0-9!”#$%&’()*+’-./:;<=>?@[\]^_`{|}~
      • And space.

  • Characters have a binary, decimal, and hex representation.
  • For example
    • The null character (\0) is
      • 0000000 in binary
      • 1 in decimal
      • 01 in hex
    • ! (exclamation) is
      • 0100001 in binary
      • 33 in decimal
      • 21 in hex

36 of 75

Character Sets: Unicode

  • Most common unicode encoding is UTF-8
    • 1 to 4 bytes: “variable-width”
  • From wikipedia:
    • “Unicode has the explicit aim of transcending the limitations of traditional character encodings, such as those defined by the ISO 8859 standard which find wide usage in various countries of the world but remain largely incompatible with each other. “

    • “In text processing, Unicode takes the role of providing a unique code point — a number, not a glyph — for each character. In other words, Unicode represents a character in an abstract way and leaves the visual rendering (size, shape, font or style) to other software, such as a web browser or word processor. “
    • Unicode defines a codespace of 1,114,112 code points in the range 0 to 10FFFF (hex).
    • UTF-8 is Linux’s default character set.

37 of 75

Other Character Sets Examples

  • ISO-8859
    • 1 byte characters, but uses all 8 bits
    • Does not include any control characters
    • Latin character sets
      • ISO-8859-1 is Latin-1, with supports Western European characters

  • SHIFT-JIS
    • Japanese characters
    • Combines single byte (JIS X0201) and double byte Japanese characters (JIS X0208)
    • For the double byte characters, the first byte is outside of the range used by the single byte characters.

38 of 75

Content-Transfer-Encoding

  • Quoted-printable
    • Good for text primarily composed of ASCII characters.
    • Encodes 8bit bytes only, leaving the text generally human-readable.
  • Base64:
    • Uses 6 bit characters: A-Za-z0-9, 62 digits
    • Used to transmit high ASCII characters through systems that are not “8-bit clean”, such as old SMTP servers.

  • 7bit
    • US-ASCII
    • CRLF only allowed as line endings
    • 998 octets per line
  • 8bit
    • 998 octects per line, support for use of all 8 bits
    • CRLF only allowed as a line ending.
  • Binary
    • Any sequence of octets

39 of 75

8BITMIME

  • 8BITMIME
    • ESMTP extension
    • Servers that advertise the extension must both support 8bit content, and be able to send to SMTP servers that do not support 8bit content
    • Downgrading means base64 encoding all MIME parts with a content-transfer-encoding of “8bit” or “binary”.
    • If this extension is not used, the receiving SMTP server may clear the 8th bit, which would mangle an 8bit message.

40 of 75

Encoded Words

  • Defined in RFC-2047
  • Encodes a string high ASCII into base64 or Quoted-Printable, and includes the original charset
    • Subject: =?utf-8?Q?=C2=A1Hola,_se=C3=B1or!?= is interpreted as "Subject: ¡Hola, señor!".
    • The form is: "=?charset?encoding?encoded text?=".

(grabbed from wikipedia)

    • Relevant because high-ascii characters are not allowed in E-mail headers.

41 of 75

Content Types

  • Text types:
    • Text/plain
    • Text/html
  • Application types:
    • Application/pdf
  • Image Types:
    • Image/gif

  • Message Types
    • Message/rfc822
  • Multipart:
    • Multipart/mixed
    • Multipart/alternative
    • Multipart/related
    • Multipart/digest

42 of 75

Content Types: Multipart

From Wikipedia:

Mixed

Multipart/mixed is used for sending files with different "Content-Type" headers inline (or as attachments). If sending pictures or other easily readable files, most mail clients will display them inline (unless otherwise specified with the "Content-disposition" header). Otherwise it will offer them as attachments. The default content-type for each part is "text/plain".

Defined in RFC 2046, Section 5.1.3

Message

A message/rfc822 part contains an email message, including any headers. Rfc822 is a misnomer, since the message may be a full MIME message. This is used for digests as well as for E-mail forwarding.

Defined in RFC 2046.

Digest

Multipart/digest is a simple way to send multiple text messages. The default content-type for each part is "message/rfc822".

Defined in RFC 2231, Section 5.1.5

Alternative

The multipart/alternative subtype indicates that each part is an "alternative" version of the same (or similar) content, each in a different format denoted by its "Content-Type" header. The formats are ordered by how faithful they are to the original, with the least faithful first and the most faithful last. Systems can then choose the "best" representation they are capable of processing; in general, this will be the last part that the system can understand, although other factors may affect this.

43 of 75

Example: multipart/alternative

From: Nathaniel Borenstein <nsb@bellcore.com>

To: Ned Freed <ned@innosoft.com>

Subject: Formatted text mail

MIME-Version: 1.0

Content-Type: multipart/alternative; boundary=boundary42

--boundary42

Content-Type: text/plain; charset=us-ascii

...plain text version of message goes here....

--boundary42

Content-Type: text/richtext

.... RFC 1341 richtext version of same message goes here ...

--boundary42

Content-Type: text/x-whatever

.... fanciest formatted version of same message goes here ...

--boundary42--

44 of 75

LDAP

  • How it is used in the Email world:
    • SMTP-AUTH
    • Aliasing and masquerading
    • Generic lookups for modifying, deleting, and rerouting messages.

45 of 75

LDAP

  • Entries are governed by a schema
  • The schema defines the attribute values that entries may contain.
  • ObjectClasses contain groups of required or optional attributes, and support hierarchy
    • Everything inherits from “top”
    • Standard ObjectClasses include
      • InetOrgPerson
      • Person
      • Organization
      • ipHost
      • posixGroup
  • An entry may have multiple ObjectClasses

46 of 75

LDAP

  • Example LDIF of an inetOrgPerson:

dn: cn=klindquist, dc=coldspark,dc=com

sn: lindquist �userPassword:: bXlwYXNzd29yZA== �ou: Engineering �carLicense: ACAR123 �mail: klindquist@coldspark.com �objectClass: inetOrgPerson �uid: klindquist �homePhone: 555-111-2222 �cn: klindquist �description: a person at Coldspark

  • LDIF stands for LDAP Data Interchange Format

47 of 75

DKIM and DK

  • Sender anti-spoofing
  • Sending server signs with private key
  • Receiving server verifies signature with the public key from sending server’s DNS.
  • Does not establish reputation, but is a prerequisite for reputation services.

48 of 75

DKIM

49 of 75

SPF

  • Sender Policy Framework
    • Domains indicate which IPs are allowed to send mail on their behalf
    • Receiving servers can verify that the message came from an approved host.
    • Limitation: Forwarding breaks SPF
    • Example record:

fidelity.com. 14400 IN TXT "v=spf1 ip4:64.58.236.239 ip4:64.90.205.0/25 ip4:129.41.0.0/16 ip4:192.223.0.0/16 ip4:137.199.0.0/16 ip4:155.199.0.0/16 ip4:202.95.0.0/16 -all"

50 of 75

SPF

51 of 75

Greylisting, tarpitting

  • Some small ISPs use greylisting and/or tarpitting to reduce spam
  • Greylisting entails temporarily rejecting mail, thus forcing the SMTP client to retry the message.
  • If a server is tarpitting, each response to an SMTP command is very slow, thus tying up a potential spammer’s MTA
  • Greylisting and tarpitting are frequently used together.

52 of 75

Brief RFC Overview

  • Aspects of our products are developed and tested against the RFCs
  • Most RFCs have areas of ambiguity, where implementation is left up to interpretation.
  • Many applications do not follow their RFCs religiously, including ours. Compatibly is sometimes a higher priority. Not all MUSTs are created equal.
  • RFCs are a good reference to back up opinions on how something should operate.

53 of 75

Some Java Concepts

54 of 75

Some Java Concepts

  • Summary:
    • Why is this useful information?
    • Stack Traces
    • Java memory management
    • Memory and CPU profiling
    • Spring and Dependency Injection

55 of 75

Why is this useful information?

  • Our products are written in Java!
  • Some of the behavioral characteristics of our products are the result of the language in which they are written.
  • Troubleshooting

56 of 75

Stack Traces

  • Most of the time, unexpected errors will appear in the form of Java stack traces.
  • Exceptions that are expected may be handled, and not spit out as a full stack trace.
  • Sometimes, a concise error is logged at a higher log level, and the full stack trace at a lower level.
  • Most importantly, stack traces can tell you a lot about what problem occurred.

57 of 75

Stack Traces

jvm 1 | java.lang.IllegalArgumentException: id to load is required for loading �jvm 1 | at org.hibernate.event.LoadEvent.<init>(LoadEvent.java:51) �jvm 1 | at org.hibernate.event.LoadEvent.<init>(LoadEvent.java:33) �jvm 1 | at org.hibernate.impl.SessionImpl.get(SessionImpl.java:812) �jvm 1 | at org.hibernate.impl.SessionImpl.get(SessionImpl.java:808) �jvm 1 | at org.springframework.orm.hibernate3.HibernateTemplate$1.doInHibernate(HibernateTemplate.java:470) �jvm 1 | at org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:372) �jvm 1 | at org.springframework.orm.hibernate3.HibernateTemplate.get(HibernateTemplate.java:464) �jvm 1 | at org.springframework.orm.hibernate3.HibernateTemplate.get(HibernateTemplate.java:458) �jvm 1 | at com.coldspark.mailfusion.dao.GenericHibernateDAO.get(GenericHibernateDAO.java:112) �jvm 1 | at com.coldspark.mailfusion.dao.GenericHibernateDAO.get(GenericHibernateDAO.java:98) �jvm 1 | at com.coldspark.mailfusion.query.dao.StoredQueryDAOHibernate.getCurrentStoredQueryRevision(StoredQueryDAOHibernate.java: �64) �jvm 1 | at com.coldspark.mailfusion.send.event.SendEventBuilder.createSendEventEntity(SendEventBuilder.java:239) �jvm 1 | at com.coldspark.mailfusion.send.event.SendEventBuilder.createSendEvent(SendEventBuilder.java:76) �jvm 1 | at com.coldspark.mailfusion.schedule.ScheduleServiceImpl.createSchedule(ScheduleServiceImpl.java:448) �jvm 1 | at com.coldspark.mailfusion.schedule.ScheduleServiceImpl.launch(ScheduleServiceImpl.java:175) �jvm 1 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) �

58 of 75

Null Pointer Exceptions

  • NullPointerExceptions are very common. They are generated when the code assumes something is present, when in fact it is not present.
  • Sometimes, the code should anticipate that the object may be null, if that is a sane state.
  • Other times, null is not an sane state for the object at all. This may be the result of a concurrency problem, or simply having steps out of order, etc.

59 of 75

NPEs

04/16/2008 12:45:02 [DBUG] handleResponse completing with GREETING response. �04/16/2008 12:45:02 [DBUG] Exception thrown from FilterQueue.enqueuejava.lang.NullPointerException �        at coldspark.engine.api.FilterQueue.doEnqueue(FilterQueue.java:262) �        at coldspark.engine.api.FilterQueue.internalEnqueue(FilterQueue.java:352) �        at coldspark.engine.api.FilterQueue.enqueue(FilterQueue.java:177) �        at coldspark.engine.smtp.server.ServerConversation.handleResponse(ServerConversation.java:1612) �        at coldspark.engine.smtp.server.ServerConversation.access$100(ServerConversation.java:1089) �        at coldspark.engine.smtp.server.ServerConversation$1.run(ServerConversation.java:1537) �        at coldspark.engine.util.thread.MonitoredRunnable.run(MonitoredRunnable.java:94) �        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) �        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) �        at java.lang.Thread.run(Thread.java:595)

60 of 75

What information does this stack trace tell us?

01/03/2008 12:58:01 [WARN] The data file reference could not be found on the file system for a disk message, while scanning for headers. for record: [0]�01/03/2008 12:58:01 [DBUG] The data file reference could not be found on the file system for a disk message, while scanning for headers. for record: [0]�java.lang.Exception: The data file reference could not be found on the file system for a disk message, while scanning for headers. for record: [0]�        at coldspark.engine.smtp.EmailMessage.getMessageHeaders(EmailMessage.java:5133)�        at coldspark.engine.smtp.EmailMessage.doLoadDiskMessage(EmailMessage.java:1678)�        at coldspark.engine.smtp.EmailMessage.loadDiskMessage(EmailMessage.java:1574)�        at coldspark.engine.smtp.client.Bucket.nextMessage(Bucket.java:425)�        at coldspark.engine.smtp.client.ClientConversation.handleHelloResponse(ClientConversation.java:2040)�        at coldspark.engine.smtp.client.ClientConversation.handleResponse(ClientConversation.java:1890)�        at coldspark.engine.smtp.client.ClientConversation.access$000(ClientConversation.java:1414)�        at coldspark.engine.smtp.client.ClientConversation$5.run(ClientConversation.java:1772)�        at coldspark.engine.util.thread.MonitoredRunnable.run(MonitoredRunnable.java:94)�        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)�        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)�        at java.lang.Thread.run(Thread.java:595)�01/03/2008 12:58:01 [DBUG] Unable to obtain headers for message: Object #[4655498] sender: [klindquist@not.com01/03/2008 12:58:01 [DBUG] Unable to obtain headers for message: Object #[4655498] sender: [klindquist@not.com] recipients: [[klindquist@otherdomain.com]]

61 of 75

How about this one?

java.net.SocketException: Broken pipe�        at java.net.SocketOutputStream.socketWrite0(Native Method)�        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)�        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)�        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)�        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)�        at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:2692)�        at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:2621)�        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1552)�        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1666)�        at com.mysql.jdbc.Connection.execSQL(Connection.java:2994)�        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:936)�        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1166)�        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1082)�        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1067)�        at coldspark.engine.results.logger.db.MailblastDbResultLogger.writeResult(MailblastDbResultLogger.java:371)�        at coldspark.engine.results.thread.ResultDAOHandlerThread.writeResultToLogger(ResultDAOHandlerThread.java:242)�     

62 of 75

Thread Traces and Deadlocks

  • Various tools can help you determine what individual threads are doing.
    • Kill -3 $PID will spit thread traces out on STDOUT
    • YourKit or any java profiler
    • Remote JMX

63 of 75

Thread Traces and Deadlocks

Found one Java-level deadlock:�=============================�"Thread-104client work pool":�  waiting to lock monitor 0x0822b714 (object 0x668b29c0, a java.lang.Object),�  which is held by "Thread-58client socket pool"�"Thread-58client socket pool":�  waiting to lock monitor 0x0822b6d4 (object 0x668b2458, a coldspark.engine.smtp.client.ClientConversation),�  which is held by "Thread-104client work pool"��Java stack information for the threads listed above:�===================================================�"Thread-104client work pool":�        at coldspark.engine.socket.chr.NbCharChannel.close(NbCharChannel.java:153)�        - waiting to lock <0x668b29c0> (a java.lang.Object)�        at coldspark.engine.smtp.Conversation.close(Conversation.java:444)�        at coldspark.engine.smtp.client.ClientConversation.close(ClientConversation.java:2630)�        - locked <0x668b2458> (a coldspark.engine.smtp.client.ClientConversation)�        at coldspark.engine.smtp.client.ClientConversation.failBucket(ClientConversation.java:2713)�        at coldspark.engine.smtp.client.ClientConversation.access$200(ClientConversation.java:1410)�        at coldspark.engine.smtp.client.ClientConversation$5.run(ClientConversation.java:1719)�        at coldspark.engine.util.thread.MonitoredRunnable.run(MonitoredRunnable.java:94)�        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)�        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)�        at java.lang.Thread.run(Thread.java:595)�"Thread-58client socket pool":�        at coldspark.engine.smtp.client.ClientConversation.closed(ClientConversation.java:3234)

64 of 75

Thread Traces and Deadlocks

  • In another case, the MB injector would stop processing send event queue items. Closer inspection revealed that certain threads were stuck in JDBC socket reads perpetually.

[DB-1000,CID=1,SE=se-1] [RUNNABLE]�java.net.SocketInputStream.socketRead0(native method)�java.net.SocketInputStream.read(SocketInputStream.java:129)�com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)�com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)�com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)�com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1931)�com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2380)�com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2909)�com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1600)�com.mysql.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:1129)�com.mysql.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:681)�com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1368)�com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1283)�com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:1268)�com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeUpdate(NewProxyPreparedStatement.java:105)�org.springframework.jdbc.core.JdbcTemplate$2.doInPreparedStatement(JdbcTemplate.java:744)�org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:537)�org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:738)�org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:796)�org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:804)�com.coldspark.injector.spi.DBHelper.setEventStatus(DBHelper.java:227)�com.coldspark.injector.spi.DBHelper.setEventStatus(DBHelper.java:183)�com.coldspark.injector.spi.DBProcessor.process(DBProcessor.java:265)�com.coldspark.injector.spi.AbstractProcessor.run(AbstractProcessor.java:118)�java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)�java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)�java.lang.Thread.run(Thread.java:595)

65 of 75

Sun JVM Memory Management

  • No explicit allocation of memory
  • Garbage collection reclaims the memory used by objects that are no longer accessible
  • Memory leaks arise when unintended object references stay around

66 of 75

Sun JVM Memory Management

  • Generational memory management for the heap
    • Objects are created in Eden
    • When Eden is full, live objects are copied to 1st survivor space, and dead objects are cleaned
    • When Eden is full again, GC copies live objects from Eden and 1st survivor to the 2nd survivor space
    • If the objects cannot all be moved, then they move to the old generation space becoming “tenured”
    • If enough memory cannot be cleared in this way, then a major garbage collection occurs.

67 of 75

Sun JVM Memory

  • PermGen
    • Interned strings
    • Objects describing classes and methods
    • Not considered part of the heap space
  • Direct buffers
    • Live in native memory, outside of the JVM’s heap
    • Java NIO uses direct buffers

68 of 75

Sun JVM Memory Management

69 of 75

Sun JVM Garbage Collection

  • Out-of-Memory
    • Throughput collector will throw OOM if it is spending too much time recovering too small a percentage of the heap
    • More generally, OOM is thrown if the JVM cannot allocate memory, and no more memory could be cleared by garbage collection.

70 of 75

Sun JVM Garbage Collection

  • Why should you care about something this esoteric?

71 of 75

Profiling

  • CPU profiling
  • Memory profiling
  • Understanding snapshots
  • Tools:
    • Yourkit
    • Jprofiler
    • Jprobe

72 of 75

Spring and Dependency Injection

  • From Wikipedia:
    • “Conventionally, if an object needs to gain access to a particular service, the object takes responsibility to get hold of that service: either it holds a direct reference to the location of that service, or it goes to a known 'service locator' and requests that it be passed back a reference to an implementation of a specified type of service. By contrast, using dependency injection, the object simply provides a property that can hold a reference to that type of service; and when the object is created a reference to an implementation of that type of service will automatically be injected into that property - by an external mechanism.”

73 of 75

Spring and Dependency Injection

  • Why is this interesting?
    • Deeply configurable
    • Change out components without code changes.
    • Many defects in MailFusion have been resolved with only a change to configuration files!

74 of 75

Spring and Dependency Injection

  • From MF’s databaseContext.xml:

<bean id="dataSource" class="com.mchange.v2.c3p0.ComboPooledDataSource" destroy-method="close">

<property name="user" value="${jdbc.username}" />

<property name="password" value="${jdbc.password}" />

<property name="jdbcUrl" value="${jdbc.url}" />

<property name="driverClass" value="${jdbc.driverClass}"/>

<!-- configuration pool via c3p0-->

<property name="acquireIncrement" value="10" />

<property name="acquireRetryAttempts" value="5" />

<property name="acquireRetryDelay" value="1000" />

<property name="autoCommitOnClose" value="false" />

<property name="checkoutTimeout" value="5000" />

<property name="idleConnectionTestPeriod" value="300" /> <!-- seconds -->

<property name="maxPoolSize" value="100" />

<property name="maxStatementsPerConnection" value="10" />

<property name="minPoolSize" value="3" />

<property name="unreturnedConnectionTimeout" value="900" /> <!-- Seconds -->

</bean>

  • From applicationContext.xml:

<property name="dataSource" ref="dataSource" />

75 of 75

Java Stuff Summary

  • A little knowledge of Java will go along way toward optimal troubleshooting efforts.
  • Don’t be afraid to interpret stack traces. They contain useful and easy to understand information.
  • Our use of the Spring Framework supports greater user configurability than ever before, so prepare to be experts!