Ronald Pringadi's Technical Notes | Constantly Learning

Show postgres lock

Posted on July 21, 2016 by ronaldpringadi

You’re staring at a query that won’t finish, or a deploy that hangs on a migration, or a UI request that just sits there. Postgres is almost certainly waiting on a lock that another transaction holds. The classic query for finding the culprit comes from the PostgreSQL wiki — it joins pg_locks against itself and against pg_stat_activity to show every blocked session and the session that’s blocking it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

A few useful additions.

The modern shortcut: pg_blocking_pids. Since PostgreSQL 9.6 there’s a built-in function that does the same join in a single line — pg_blocking_pids(pid) returns an array of PIDs that are blocking the given PID. The whole “who’s blocking whom” query becomes:

1
2
3
4
5
6
7
8

SELECT a.pid AS blocked_pid,
a.usename AS blocked_user,
a.query AS blocked_statement,
pg_blocking_pids(a.pid) AS blocking_pids,
a.wait_event_type,
a.wait_event
FROM pg_stat_activity a
WHERE cardinality(pg_blocking_pids(a.pid)) > 0;

Shorter, easier to remember, and it handles fast-path locks correctly (the wiki query above can occasionally miss those). Use this on PG 9.6+ unless you specifically need the lock-type detail the wiki query produces.

How to read the result. blocked_pid is the session that’s stuck. blocking_pid is the one holding the lock. The catch: current_statement_in_blocking_process is whatever the blocker is running right now — often idle in transaction or some unrelated follow-up query, not the statement that originally took the lock. So if you see SELECT 1 in the blocker’s column, don’t be confused — the real cause is that this transaction has been holding locks for a while; the displayed query is just whatever it last ran. Look at state and xact_start in pg_stat_activity to see how long the transaction has been open.

Once you’ve found the blocker. Two functions, escalating in force:

1
2
3
4
5

-- Polite: ask the session to cancel its current query (transaction stays open)
SELECT pg_cancel_backend(12345);

-- Forceful: kill the entire backend connection (transaction rolls back)
SELECT pg_terminate_backend(12345);

pg_cancel_backend sends a SIGINT-like signal — the running statement is interrupted but the connection survives. pg_terminate_backend drops the whole connection. Try cancel first; reach for terminate when the session is wedged or holding locks while idle. Both require either superuser or membership in the pg_signal_backend role (PG 13+).

Don’t just kill — find out why. Killing the blocker stops the immediate pain, but if the same query keeps blocking everyone, the fix is upstream — a missing index, a transaction that stays open across a slow external call, an unbatched bulk update, or schema migrations during traffic. The query above shows you which session; the actual investigation usually lives in pg_stat_activity‘s state, xact_start, and application_name columns. 🔎

Posted in Database, PostgreSQL | Comments Off

Apache Lucene: The Search Engine Hiding Inside Half the Internet

Posted on July 21, 2016 by ronaldpringadi

If you’ve ever used Elasticsearch, Solr, or even some features in big platforms like Twitter or LinkedIn, chances are you’ve been touching Apache Lucene without knowing it. It’s the quiet workhorse — a Java library that does one thing extraordinarily well: full-text search. ☕

What Is Lucene?

Lucene is not a database, not a server, not a product you install. It’s a library — a JAR you drop into your Java application to add search capability. It handles indexing documents, parsing queries, scoring results by relevance, and giving you back ranked hits. Everything else (storage, networking, clustering) is left to you, which is exactly why projects like Elasticsearch and Solr wrap it: they add the operational layer on top of Lucene’s core search engine.

How Is It Different From Regular Search?

When you write SELECT * FROM articles WHERE body LIKE ‘%lucene%’, the database scans every row, character by character. It works, but it’s slow on millions of rows, and it can’t tell you which match is most relevant. A LIKE query doesn’t know that “running” and “runs” are related, or that an article mentioning “lucene” 12 times is probably more relevant than one that mentions it once.

Lucene flips the problem around with an inverted index. Instead of storing documents and scanning them, it stores a map of terms → documents containing those terms. Searching for “lucene” becomes a hash lookup, not a scan. On top of that, it:

Tokenizes and analyzes text — splits on whitespace, lowercases, strips punctuation, applies stemming (so “running” → “run”)
Scores by relevance using TF-IDF (or BM25 in newer versions) — documents with rarer matching terms rank higher
Supports fuzzy, wildcard, phrase, and boolean queries out of the box
Handles millions of documents with sub-millisecond query times

The Value Proposition

If your application has any user-facing search — product catalogs, document libraries, support tickets, log analysis, code search — rolling your own with SQL LIKE will eventually break. Lucene gives you Google-quality relevance ranking, fast queries on huge corpora, and a mature ecosystem, all from a single JAR. It’s the reason Elasticsearch became the de facto search backend for so many companies: Lucene under the hood, REST API on top. 💡

A Minimal Java Example

Here’s the classic “hello world” — index three documents in memory and search them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class LuceneHello {
public static void main(String[] args) throws Exception {
Directory dir = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer();

// --- Index three documents ---
IndexWriterConfig cfg = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(dir, cfg);

addDoc(writer, "Lucene is a Java full-text search library.");
addDoc(writer, "Elasticsearch is built on top of Lucene.");
addDoc(writer, "PostgreSQL has full-text search too, but differently.");
writer.close();

// --- Search ---
DirectoryReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
Query query = new QueryParser("body", analyzer).parse("lucene");
TopDocs hits = searcher.search(query, 10);

System.out.println("Found " + hits.totalHits + " matches:");
for (ScoreDoc sd : hits.scoreDocs) {
Document d = searcher.doc(sd.doc);
System.out.printf(" score=%.3f %s%n", sd.score, d.get("body"));
}
reader.close();
}

private static void addDoc(IndexWriter w, String text) throws Exception {
Document doc = new Document();
doc.add(new TextField("body", text, Field.Store.YES));
w.addDocument(doc);
}
}

Run it and you’ll see two hits, ranked — the Lucene-focused sentence scores higher than the Elasticsearch one, because “lucene” appears more centrally and the document is shorter (so the term carries more weight).

A Fuzzier Query

Lucene’s query parser supports a tiny DSL. A DSL — short for Domain-Specific Language — is a small, purpose-built mini-language designed to do one thing well, as opposed to a general-purpose language like Java or Python that can do anything. SQL is a DSL for querying data, regex is a DSL for pattern matching, CSS selectors are a DSL for picking DOM elements. Lucene’s query syntax is a DSL for expressing search intent. (Not to be confused with the other DSL — Digital Subscriber Line — the telecom tech for internet over copper phone lines. Same acronym, completely unrelated worlds. ☎️)

The ~ operator gives you fuzzy matching (edit distance), and you can boost terms with ^:

1
2
3
4
5
6
7
8
9
10
11

// Matches "lucene", "lucine", "lucenne" — anything within edit distance 2
Query fuzzy = new QueryParser("body", analyzer).parse("lucenne~2");

// Boost "java" 3x, so docs mentioning java rank higher
Query boosted = new QueryParser("body", analyzer).parse("search java^3");

// Field-scoped phrase search — another bit of the DSL
Query phrase = new QueryParser("body", analyzer).parse("title:"full-text search"");

// Boolean combination
Query bool = new QueryParser("body", analyzer).parse("lucene AND java NOT solr");

That whole little grammar — the ~, the ^, the field:value prefix, the AND/OR/NOT keywords — is what makes it a DSL. You’re not writing Java to build queries; you’re writing a string in Lucene’s query language, and the QueryParser compiles it into Query objects for you.

That’s the whole pitch. If you’ve been getting by with SQL LIKE and your users are starting to complain that search “doesn’t find anything,” Lucene (or Elasticsearch on top of it) is almost certainly the next step. 🔍

Posted in java | Tagged Java, lucene, search | Comments Off

Check all table size in Postgres

Posted on July 12, 2016 by ronaldpringadi

You want a quick “which tables are eating my disk?” overview — usually before a vacuum, an archive purge, or a capacity-planning conversation. Here’s a one-shot query that joins information_schema.tables against pg_class to show every public table with its estimated row count and size:

1
2
3
4
5
6
7
8

SELECT ist.table_name,
reltuples AS "entries",
pg_size_pretty(relpages::BIGINT * 8 * 1024) AS SIZE
FROM information_schema.tables ist
INNER JOIN pg_class c ON (ist.table_name = c.relname)
WHERE ist.table_catalog = current_database()
AND ist.table_schema = 'public'
ORDER BY relpages DESC, ist.table_name ASC;

The columns: reltuples is the planner’s row-count estimate (updated by ANALYZE / autovacuum), and relpages * 8 KB is the table’s heap size — Postgres pages are 8 KB by default. Tables come out biggest-first.

A few useful additions.

relpages is heap-only — there’s a built-in for the full picture. The query above only counts the main table heap. It does not include indexes, TOAST tables (where wide values overflow), or the FSM/VM bookkeeping. For “how big is this table really, on disk?” use pg_total_relation_size:

1
2
3
4
5
6
7
8
9
10
11

SELECT n.nspname AS schema,
c.relname AS TABLE,
pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
pg_size_pretty(pg_indexes_size(c.oid)) AS indexes_size,
c.reltuples::BIGINT AS estimated_rows
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(c.oid) DESC;

Three sizes side by side:

pg_relation_size — the main table heap only (what the original query approximates)
pg_indexes_size — every index on the table, summed
pg_total_relation_size — heap + indexes + TOAST + everything else attached to the relation

The gap between heap_size and total_size is often surprisingly large — a write-heavy table with several indexes can carry 3–5x its heap size in indexes. That’s where you start looking for indexes you don’t actually use (pg_stat_user_indexes with idx_scan = 0).

reltuples is an estimate, not a count. The number can be stale, and on a brand-new table it may even be -1 until ANALYZE runs. If you need the true row count, use SELECT count(*) FROM mytable — but that’s an exact scan, so reach for it only when you really need exact, on smaller tables. For a quick directional answer, reltuples is fine.

Skip the SQL entirely with \dt+ in psql. Inside psql, the meta-command \dt+ shows every table in the current schema along with its size:

1	psql -d mydb -c "\dt+"

Quick, readable, no joins to write. \dt+ schema.* for a specific schema; \di+ for indexes. 📐

Posted in Database, PostgreSQL | Comments Off

Synchronize a postgres table through bash and csv

Posted on June 29, 2016 by ronaldpringadi

Please note that the csv export process does not escape commas.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

#!/bin/bash
DIR='/root/sql_dump'
mkdir -p "$DIR"
cd "$DIR"

SIZE=100000
#START=611244350
START=0
END=$((START + SIZE))
STOP=189097000
TABLNAME="schema.tablename"
while [[ $START -lt $STOP ]] && [[ $END -le $STOP ]]; do
export PGPASSWORD="PASSWORD#1"

echo "Read from SOURCE start $START end $END"
FILE="${DIR}/dump${START}.csv"
SQL="SELECT * FROM ${TABLNAME} where id > $START and id <= $END"
psql -U postgres -h SOURCEHOST -d DB1 -t -A -F"," -c "$SQL" > "$FILE"

export PGPASSWORD="PASSWORD#1"
SQL="DELETE FROM ${TABLNAME} where id > $START and id <= $END"
psql -U postgres -h localhost -d DB2 -t -A -F"," -c "$SQL"
psql -U postgres -h localhost -d DB2 -c "\copy ${TABLNAME} FROM '${FILE}' DELIMITER ',' CSV"
echo "Injected to DESTINATION"

START=$END
END=$((START + SIZE))
if [ $END -gt $STOP ]; then
END=$STOP
fi

rm $FILE

done

Posted in Database, PostgreSQL | Comments Off

Export and Import Postgres query to CSV

Posted on June 24, 2016 by ronaldpringadi

# =========================================================== # Export to CSV # =========================================================== # export PGPASSWORD="YOURPASSWORD" # psql -U YOURUSERNAME -h YOURHOSTNAME -d YOURDBNAME-t -A -F"," -c "select * from YOURTABLENAME limit 2" > output.csv


# ===========================================================

# Import

# ===========================================================

# export PGPASSWORD="YOURPASSWORD"

# psql -U YOURUSERNAME -h YOURHOSTNAME -d YOURDBNAME -c "\copy YOURTABLENAME FROM '/tmp/dump.csv' DELIMITER ',' CSV"


	
			
									
						Posted in Database, PostgreSQL					
					|
												Comments Off on Export and Import Postgres query to CSV



		
	


			
			Assorted postgres queries

			
				Posted on June 23, 2016 by ronaldpringadi			


				
				Get table sizes: 


SELECT

   relname AS objectname,

   relkind AS objecttype,

   reltuples AS "#entries",

	pg_size_pretty(relpages::bigint*8*1024) AS size

   FROM pg_class

   WHERE relpages >= 8

   ORDER BY relpages DESC;


							

	
			
									
						Posted in Database, PostgreSQL					
					|
												Comments Off on Assorted postgres queries
							

		


		
	


			
			VirtualBox – Imporving usability on guest OS

			
				Posted on May 17, 2016 by ronaldpringadi			


				
				GUEST OS: Centos 6.7 final

HOST OS: Windows 7

Virtualbox version: 5.0.4
1
yum install gcc kernel-devel kernel-headers dkms make bzip2 perl
							

	
			
									
						Posted in Linux, Operating System					
					|
												Comments Off on VirtualBox – Imporving usability on guest OS
							

		


		
	


			
			RedHat or CentOS 6 iptables adding an open port

			
				Posted on May 16, 2016 by ronaldpringadi			


				
				iptables –line -vnL

iptables -I INPUT 5 -p tcp –dport 80 -m state –state NEW,ESTABLISHED -j ACCEPT

service iptables save
							

	
			
									
						Posted in Linux					
					|
												Comments Off on RedHat or CentOS 6 iptables adding an open port
							

		


		
	


			
			Postgres SSD optimization

			
				Posted on April 13, 2016 by ronaldpringadi			


				
				If you’re running your database on an SSD instead of a spinning disk, you might want to optimize postgres table space cost:
1
2
3
4
5
6
7
8
9
10
-- Change the tablespace cost

ALTER TABLESPACE pg_default SET ( seq_page_cost = 20,  random_page_cost = 1 );

-- Verifiy the change

SELECT * FROM pg_tablespace;





-- Undo the tablespace cost

ALTER TABLESPACE pg_default RESET  ( seq_page_cost,  random_page_cost);

-- Verifiy the change

SELECT * FROM pg_tablespace;
							

	
			
									
						Posted in Database, PostgreSQL					
					|
												Comments Off on Postgres SSD optimization
							

		


		
	


			
			Ant Junit debugging

			
				Posted on April 13, 2016 by ronaldpringadi			


				
				Sometime we want to debug why ant build failed when executing a certain JUnit

Make sure your ant junit task look like the following
1
2
3
4
<junit printsummary="withOutAndErr" haltonfailure="yes">

:

:

</junit>
and not like
1
2
3
4
<junit printsummary="yes" haltonfailure="yes">

:

:

</junit>
							

	
			
									
						Posted in java					
					|
												Comments Off on Ant Junit debugging
							

		


		
	

				
					← Older posts
					Newer posts →




		
			


			
				
				
					Search for:
					
					
				
			
			

			
				Archives
				
						May 2026
	April 2026
	November 2025
	December 2024
	November 2024
	June 2024
	September 2023
	March 2023
	August 2022
	April 2022
	September 2021
	September 2020
	March 2019
	March 2018
	June 2017
	May 2017
	November 2016
	September 2016
	July 2016
	June 2016
	May 2016
	April 2016
	March 2016
	February 2016
	January 2016
	December 2015
	November 2015
	October 2015
	September 2015
	August 2015
	July 2015
	June 2015
	April 2015
	December 2014
	October 2014
	September 2014
	May 2014
	April 2014
	March 2014
	January 2014
	November 2013
	October 2013
	October 2012
	September 2012
	August 2012
	July 2012
	June 2012
	March 2012
	February 2012
	January 2012
	December 2011
	November 2011
	October 2011
	September 2011
	August 2011
				
			

			
				Meta
				
					Register
					Log in
									
			

					
		



		
			
				Categories
			
					Analytics (1)

	Android (1)

	AWS (1)

	C# (1)

	Database (20)

	MySQL (3)

	PostgreSQL (16)



	DevOps (3)

	Hardware (1)

	Benchmark (1)



	Investment (3)

	Stock Market (3)



	java (12)

	Laravel (7)

	LDAP and Active Directory (1)

	Moodle (3)

	Operating System (56)

	DOS (2)

	Linux (51)
	
	Bash (10)

	TCL/Expect (1)

	

	Ubuntu (11)

	Version Control (2)

	Windows 7 (2)



	Python (3)

	Uncategorized (1)

	Visual Basic (1)

	Web Development (24)

	javascript (6)
	
	Ext Js (1)

	

	PHP (15)