Show postgres lock

You’re staring at a query that won’t finish, or a deploy that hangs on a migration, or a UI request that just sits there. Postgres is almost certainly waiting on a lock that another transaction holds. The classic query for finding the culprit comes from the PostgreSQL wiki — it joins pg_locks against itself and against pg_stat_activity to show every blocked session and the session that’s blocking it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SELECT blocked_locks.pid     AS blocked_pid,
       blocked_activity.usename  AS blocked_user,
       blocking_locks.pid    AS blocking_pid,
       blocking_activity.usename AS blocking_user,
       blocked_activity.query    AS blocked_statement,
       blocking_activity.query   AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks         blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
  ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks         blocking_locks
  ON blocking_locks.locktype       = blocked_locks.locktype
 AND blocking_locks.database       IS NOT DISTINCT FROM blocked_locks.database
 AND blocking_locks.relation       IS NOT DISTINCT FROM blocked_locks.relation
 AND blocking_locks.page           IS NOT DISTINCT FROM blocked_locks.page
 AND blocking_locks.tuple          IS NOT DISTINCT FROM blocked_locks.tuple
 AND blocking_locks.virtualxid     IS NOT DISTINCT FROM blocked_locks.virtualxid
 AND blocking_locks.transactionid  IS NOT DISTINCT FROM blocked_locks.transactionid
 AND blocking_locks.classid        IS NOT DISTINCT FROM blocked_locks.classid
 AND blocking_locks.objid          IS NOT DISTINCT FROM blocked_locks.objid
 AND blocking_locks.objsubid       IS NOT DISTINCT FROM blocked_locks.objsubid
 AND blocking_locks.pid           != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
  ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

A few useful additions.

The modern shortcut: pg_blocking_pids. Since PostgreSQL 9.6 there’s a built-in function that does the same join in a single line — pg_blocking_pids(pid) returns an array of PIDs that are blocking the given PID. The whole “who’s blocking whom” query becomes:

1
2
3
4
5
6
7
8
SELECT a.pid                     AS blocked_pid,
       a.usename                 AS blocked_user,
       a.query                   AS blocked_statement,
       pg_blocking_pids(a.pid)   AS blocking_pids,
       a.wait_event_type,
       a.wait_event
FROM pg_stat_activity a
WHERE cardinality(pg_blocking_pids(a.pid)) > 0;

Shorter, easier to remember, and it handles fast-path locks correctly (the wiki query above can occasionally miss those). Use this on PG 9.6+ unless you specifically need the lock-type detail the wiki query produces.

How to read the result. blocked_pid is the session that’s stuck. blocking_pid is the one holding the lock. The catch: current_statement_in_blocking_process is whatever the blocker is running right now — often idle in transaction or some unrelated follow-up query, not the statement that originally took the lock. So if you see SELECT 1 in the blocker’s column, don’t be confused — the real cause is that this transaction has been holding locks for a while; the displayed query is just whatever it last ran. Look at state and xact_start in pg_stat_activity to see how long the transaction has been open.

Once you’ve found the blocker. Two functions, escalating in force:

1
2
3
4
5
-- Polite: ask the session to cancel its current query (transaction stays open)
SELECT pg_cancel_backend(12345);

-- Forceful: kill the entire backend connection (transaction rolls back)
SELECT pg_terminate_backend(12345);

pg_cancel_backend sends a SIGINT-like signal — the running statement is interrupted but the connection survives. pg_terminate_backend drops the whole connection. Try cancel first; reach for terminate when the session is wedged or holding locks while idle. Both require either superuser or membership in the pg_signal_backend role (PG 13+).

Don’t just kill — find out why. Killing the blocker stops the immediate pain, but if the same query keeps blocking everyone, the fix is upstream — a missing index, a transaction that stays open across a slow external call, an unbatched bulk update, or schema migrations during traffic. The query above shows you which session; the actual investigation usually lives in pg_stat_activity‘s state, xact_start, and application_name columns. 🔎

Posted in Database, PostgreSQL | Comments Off on Show postgres lock

Apache Lucene: The Search Engine Hiding Inside Half the Internet

If you’ve ever used Elasticsearch, Solr, or even some features in big platforms like Twitter or LinkedIn, chances are you’ve been touching Apache Lucene without knowing it. It’s the quiet workhorse — a Java library that does one thing extraordinarily well: full-text search. ☕

What Is Lucene?

Lucene is not a database, not a server, not a product you install. It’s a library — a JAR you drop into your Java application to add search capability. It handles indexing documents, parsing queries, scoring results by relevance, and giving you back ranked hits. Everything else (storage, networking, clustering) is left to you, which is exactly why projects like Elasticsearch and Solr wrap it: they add the operational layer on top of Lucene’s core search engine.

How Is It Different From Regular Search?

When you write SELECT * FROM articles WHERE body LIKE ‘%lucene%’, the database scans every row, character by character. It works, but it’s slow on millions of rows, and it can’t tell you which match is most relevant. A LIKE query doesn’t know that “running” and “runs” are related, or that an article mentioning “lucene” 12 times is probably more relevant than one that mentions it once.

Lucene flips the problem around with an inverted index. Instead of storing documents and scanning them, it stores a map of terms → documents containing those terms. Searching for “lucene” becomes a hash lookup, not a scan. On top of that, it:

  • Tokenizes and analyzes text — splits on whitespace, lowercases, strips punctuation, applies stemming (so “running” → “run”)
  • Scores by relevance using TF-IDF (or BM25 in newer versions) — documents with rarer matching terms rank higher
  • Supports fuzzy, wildcard, phrase, and boolean queries out of the box
  • Handles millions of documents with sub-millisecond query times

The Value Proposition

If your application has any user-facing search — product catalogs, document libraries, support tickets, log analysis, code search — rolling your own with SQL LIKE will eventually break. Lucene gives you Google-quality relevance ranking, fast queries on huge corpora, and a mature ecosystem, all from a single JAR. It’s the reason Elasticsearch became the de facto search backend for so many companies: Lucene under the hood, REST API on top. 💡

A Minimal Java Example

Here’s the classic “hello world” — index three documents in memory and search them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class LuceneHello {
    public static void main(String[] args) throws Exception {
        Directory dir = new RAMDirectory();
        StandardAnalyzer analyzer = new StandardAnalyzer();

        // --- Index three documents ---
        IndexWriterConfig cfg = new IndexWriterConfig(analyzer);
        IndexWriter writer = new IndexWriter(dir, cfg);

        addDoc(writer, "Lucene is a Java full-text search library.");
        addDoc(writer, "Elasticsearch is built on top of Lucene.");
        addDoc(writer, "PostgreSQL has full-text search too, but differently.");
        writer.close();

        // --- Search ---
        DirectoryReader reader = DirectoryReader.open(dir);
        IndexSearcher searcher = new IndexSearcher(reader);
        Query query = new QueryParser("body", analyzer).parse("lucene");
        TopDocs hits = searcher.search(query, 10);

        System.out.println("Found " + hits.totalHits + " matches:");
        for (ScoreDoc sd : hits.scoreDocs) {
            Document d = searcher.doc(sd.doc);
            System.out.printf("  score=%.3f  %s%n", sd.score, d.get("body"));
        }
        reader.close();
    }

    private static void addDoc(IndexWriter w, String text) throws Exception {
        Document doc = new Document();
        doc.add(new TextField("body", text, Field.Store.YES));
        w.addDocument(doc);
    }
}

Run it and you’ll see two hits, ranked — the Lucene-focused sentence scores higher than the Elasticsearch one, because “lucene” appears more centrally and the document is shorter (so the term carries more weight).

A Fuzzier Query

Lucene’s query parser supports a tiny DSL. A DSL — short for Domain-Specific Language — is a small, purpose-built mini-language designed to do one thing well, as opposed to a general-purpose language like Java or Python that can do anything. SQL is a DSL for querying data, regex is a DSL for pattern matching, CSS selectors are a DSL for picking DOM elements. Lucene’s query syntax is a DSL for expressing search intent. (Not to be confused with the other DSL — Digital Subscriber Line — the telecom tech for internet over copper phone lines. Same acronym, completely unrelated worlds. ☎️)

The ~ operator gives you fuzzy matching (edit distance), and you can boost terms with ^:

1
2
3
4
5
6
7
8
9
10
11
// Matches "lucene", "lucine", "lucenne" — anything within edit distance 2
Query fuzzy = new QueryParser("body", analyzer).parse("lucenne~2");

// Boost "java" 3x, so docs mentioning java rank higher
Query boosted = new QueryParser("body", analyzer).parse("search java^3");

// Field-scoped phrase search — another bit of the DSL
Query phrase = new QueryParser("body", analyzer).parse("title:"full-text search"");

// Boolean combination
Query bool = new QueryParser("body", analyzer).parse("lucene AND java NOT solr");

That whole little grammar — the ~, the ^, the field:value prefix, the AND/OR/NOT keywords — is what makes it a DSL. You’re not writing Java to build queries; you’re writing a string in Lucene’s query language, and the QueryParser compiles it into Query objects for you.

That’s the whole pitch. If you’ve been getting by with SQL LIKE and your users are starting to complain that search “doesn’t find anything,” Lucene (or Elasticsearch on top of it) is almost certainly the next step. 🔍

Posted in java | Tagged , , | Comments Off on Apache Lucene: The Search Engine Hiding Inside Half the Internet

Check all table size in Postgres

You want a quick “which tables are eating my disk?” overview — usually before a vacuum, an archive purge, or a capacity-planning conversation. Here’s a one-shot query that joins information_schema.tables against pg_class to show every public table with its estimated row count and size:

1
2
3
4
5
6
7
8
SELECT ist.table_name,
       reltuples AS "entries",
       pg_size_pretty(relpages::BIGINT * 8 * 1024) AS SIZE
FROM information_schema.tables ist
INNER JOIN pg_class c ON (ist.table_name = c.relname)
WHERE ist.table_catalog = current_database()
  AND ist.table_schema  = 'public'
ORDER BY relpages DESC, ist.table_name ASC;

The columns: reltuples is the planner’s row-count estimate (updated by ANALYZE / autovacuum), and relpages * 8 KB is the table’s heap size — Postgres pages are 8 KB by default. Tables come out biggest-first.


A few useful additions.

relpages is heap-only — there’s a built-in for the full picture. The query above only counts the main table heap. It does not include indexes, TOAST tables (where wide values overflow), or the FSM/VM bookkeeping. For “how big is this table really, on disk?” use pg_total_relation_size:

1
2
3
4
5
6
7
8
9
10
11
SELECT n.nspname AS schema,
       c.relname AS TABLE,
       pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
       pg_size_pretty(pg_relation_size(c.oid))       AS heap_size,
       pg_size_pretty(pg_indexes_size(c.oid))        AS indexes_size,
       c.reltuples::BIGINT                            AS estimated_rows
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(c.oid) DESC;

Three sizes side by side:

  • pg_relation_size — the main table heap only (what the original query approximates)
  • pg_indexes_size — every index on the table, summed
  • pg_total_relation_size — heap + indexes + TOAST + everything else attached to the relation

The gap between heap_size and total_size is often surprisingly large — a write-heavy table with several indexes can carry 3–5x its heap size in indexes. That’s where you start looking for indexes you don’t actually use (pg_stat_user_indexes with idx_scan = 0).

reltuples is an estimate, not a count. The number can be stale, and on a brand-new table it may even be -1 until ANALYZE runs. If you need the true row count, use SELECT count(*) FROM mytable — but that’s an exact scan, so reach for it only when you really need exact, on smaller tables. For a quick directional answer, reltuples is fine.

Skip the SQL entirely with \dt+ in psql. Inside psql, the meta-command \dt+ shows every table in the current schema along with its size:

1
psql -d mydb -c "\dt+"

Quick, readable, no joins to write. \dt+ schema.* for a specific schema; \di+ for indexes. 📐

Posted in Database, PostgreSQL | Comments Off on Check all table size in Postgres

Synchronize a postgres table through bash and csv

Please note that the csv export process does not escape commas.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/bin/bash
DIR='/root/sql_dump'
mkdir -p "$DIR"
cd "$DIR"


SIZE=100000
#START=611244350
START=0
END=$((START + SIZE))
STOP=189097000
TABLNAME="schema.tablename"
while [[ $START -lt $STOP ]] && [[ $END -le $STOP ]]; do
    export PGPASSWORD="PASSWORD#1"
   
    echo "Read from SOURCE start $START end $END"
    FILE="${DIR}/dump${START}.csv"
    SQL="SELECT * FROM  ${TABLNAME} where id > $START and id <= $END"
    psql -U postgres -h SOURCEHOST -d DB1 -t -A -F"," -c "$SQL" > "$FILE"
   
    export PGPASSWORD="PASSWORD#1"
    SQL="DELETE FROM ${TABLNAME} where id > $START and id <= $END"
    psql -U postgres -h localhost -d DB2 -t -A -F"," -c "$SQL"
    psql -U postgres -h localhost -d DB2 -c "\copy ${TABLNAME} FROM '${FILE}' DELIMITER ',' CSV"
    echo "Injected to DESTINATION"
   
    START=$END
    END=$((START + SIZE))
    if [ $END -gt $STOP ]; then
        END=$STOP  
    fi
   
    rm $FILE

done
Posted in Database, PostgreSQL | Comments Off on Synchronize a postgres table through bash and csv

Export and Import Postgres query to CSV


# ===========================================================
# Export to CSV
# ===========================================================
# export PGPASSWORD="YOURPASSWORD"
# psql -U YOURUSERNAME -h YOURHOSTNAME -d YOURDBNAME-t -A -F"," -c "select * from YOURTABLENAME limit 2" > output.csv

# ===========================================================
# Import
# ===========================================================
# export PGPASSWORD="YOURPASSWORD"
# psql -U YOURUSERNAME -h YOURHOSTNAME -d YOURDBNAME -c "\copy YOURTABLENAME FROM '/tmp/dump.csv' DELIMITER ',' CSV"

Posted in Database, PostgreSQL | Comments Off on Export and Import Postgres query to CSV

Assorted postgres queries

Get table sizes:


SELECT
relname AS objectname,
relkind AS objecttype,
reltuples AS "#entries",
pg_size_pretty(relpages::bigint*8*1024) AS size
FROM pg_class
WHERE relpages >= 8

ORDER BY relpages DESC;

Posted in Database, PostgreSQL | Comments Off on Assorted postgres queries

VirtualBox – Imporving usability on guest OS

GUEST OS: Centos 6.7 final
HOST OS: Windows 7
Virtualbox version: 5.0.4

1
yum install gcc kernel-devel kernel-headers dkms make bzip2 perl
Posted in Linux, Operating System | Comments Off on VirtualBox – Imporving usability on guest OS

RedHat or CentOS 6 iptables adding an open port

iptables –line -vnL
iptables -I INPUT 5 -p tcp –dport 80 -m state –state NEW,ESTABLISHED -j ACCEPT
service iptables save

Posted in Linux | Comments Off on RedHat or CentOS 6 iptables adding an open port

Postgres SSD optimization

If you’re running your database on an SSD instead of a spinning disk, you might want to optimize postgres table space cost:

1
2
3
4
5
6
7
8
9
10
-- Change the tablespace cost
ALTER TABLESPACE pg_default SET ( seq_page_cost = 20,  random_page_cost = 1 );
-- Verifiy the change
SELECT * FROM pg_tablespace;


-- Undo the tablespace cost
ALTER TABLESPACE pg_default RESET  ( seq_page_cost,  random_page_cost);
-- Verifiy the change
SELECT * FROM pg_tablespace;
Posted in Database, PostgreSQL | Comments Off on Postgres SSD optimization

Ant Junit debugging

Sometime we want to debug why ant build failed when executing a certain JUnit
Make sure your ant junit task look like the following

1
2
3
4
<junit printsummary="withOutAndErr" haltonfailure="yes">
:
:
</junit>

and not like

1
2
3
4
<junit printsummary="yes" haltonfailure="yes">
:
:
</junit>
Posted in java | Comments Off on Ant Junit debugging